generic options for explain

Started by Robert Haasover 16 years ago75 messages

robertmhaas@gmail.com

over 16 years ago

1 attachment(s)

Well, here we are! Yet another thread about some piece of information
that's omitted from EXPLAIN and can't easily be added because there
are a zillion things we want to add to EXPLAIN and it's not OK to bury
the user[1]http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de! I've long been of the opinion that the right way to fix
this problem is to extend the syntax with some sort of extensible
options syntax[2]http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com. The current "EXPLAIN [ANALYZE] [VERBOSE] <query>"
syntax does not scale to large numbers of options - it requires that
the options occur in a fixed order, and that the option names all be
keywords. Having gotten throughly fed up with having this
conversation for the ump-teenth time, I wrote a patch to introduce
just such a syntax. See attached.

What I did is borrowed the generic options stuff that Peter Eisentraut
introduced for FOREIGN DATA WRAPPER et. al, so you can write:

EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
e.g. EXPLAIN (ANALYZE "on") query

As written, this patch doesn't introduce any actual new functionality,
but I think it's pretty easy to see how we could build on the syntax
to add things like different types of output formats, different types
of instrumentation, etc. A few other random notes:

- This currently lacks documentation. If we have any consensus that
this is a reasonable approach, I'll add some.
- I noticed that we currently accept as a top-level SQL command an
arbitrarily parenthesized SELECT statement, like ((SELECT 3)). But
you can't put parentheses around any other type of statement. Even
more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
me makes no sense at all. But that's neither here nor there as far as
this patch is concerned, except that it required some minor grammar
hackery and a long comment explaining the hackery.

Thoughts?

...Robert

[1]: http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de
[2]: http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com

Attachments:

explain_options.v1.patchtext/x-diff; charset=US-ASCII; name=explain_options.v1.patchDownload

*** a/src/backend/nodes/makefuncs.c
--- b/src/backend/nodes/makefuncs.c
***************
*** 17,24 ****
--- 17,26 ----
  
  #include "catalog/pg_type.h"
  #include "nodes/makefuncs.h"
+ #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  
+ static bool parseBooleanGenericOption(DefElem *opt);
  
  /*
   * makeA_Expr -
***************
*** 385,387 **** makeDefElemExtended(char *namespace, char *name, Node *arg,
--- 387,435 ----
  
  	return res;
  }
+ 
+ /*
+  * makeExplain -
+  *  build an ExplainStmt node by parsing the generic options list
+  */
+ ExplainStmt *
+ makeExplain(List *options, Node *query)
+ {
+ 	ExplainStmt *n = makeNode(ExplainStmt);
+ 	ListCell *lc;
+ 
+ 	foreach (lc, options)
+ 	{
+ 		DefElem *opt = lfirst(lc);
+ 		if (!strcmp(opt->defname, "analyze"))
+ 			n->analyze = parseBooleanGenericOption(opt);
+ 		else if (!strcmp(opt->defname, "verbose"))
+ 			n->verbose = parseBooleanGenericOption(opt);
+ 		else
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_UNDEFINED_PARAMETER),
+ 				 errmsg("unknown EXPLAIN option: %s", opt->defname)));
+ 	}
+ 
+ 	n->query = query;
+ 	return n;
+ }
+ 
+ /*
+  * parseBooleanGenericOption -
+  * 	Interpret a generic option as a boolean.  Currently, a generic_option_arg
+  * 	can only be a string, so we don't need to worry about anything else.
+  */
+ static bool
+ parseBooleanGenericOption(DefElem *opt)
+ {
+ 	bool res;
+ 
+ 	Assert(IsA(opt->arg, String));
+ 	if (!parse_bool(strVal(opt->arg), &res))
+ 		ereport(ERROR,
+ 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 			 errmsg("parameter \"%s\" requires a Boolean value",
+ 				opt->defname)));
+ 	return res;
+ }
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***************
*** 6436,6452 **** opt_name_list:
   *
   *		QUERY:
   *				EXPLAIN [ANALYZE] [VERBOSE] query
   *
   *****************************************************************************/
  
  ExplainStmt: EXPLAIN opt_analyze opt_verbose ExplainableStmt
  				{
! 					ExplainStmt *n = makeNode(ExplainStmt);
  					n->analyze = $2;
  					n->verbose = $3;
- 					n->query = $4;
  					$$ = (Node *)n;
  				}
  		;
  
  ExplainableStmt:
--- 6436,6456 ----
   *
   *		QUERY:
   *				EXPLAIN [ANALYZE] [VERBOSE] query
+  *				EXPLAIN ( generic options ) query
   *
   *****************************************************************************/
  
  ExplainStmt: EXPLAIN opt_analyze opt_verbose ExplainableStmt
  				{
! 					ExplainStmt *n = makeExplain(NIL, (Node *) $4);
  					n->analyze = $2;
  					n->verbose = $3;
  					$$ = (Node *)n;
  				}
+ 		|	EXPLAIN '(' generic_option_list ')' ExplainableStmt
+ 				{
+ 					$$ = (Node *) makeExplain((List *) $3, (Node *) $5);
+ 				}
  		;
  
  ExplainableStmt:
***************
*** 6459,6467 **** ExplainableStmt:
  			| ExecuteStmt					/* by default all are $$=$1 */
  		;
  
  opt_analyze:
  			analyze_keyword			{ $$ = TRUE; }
! 			| /* EMPTY */			{ $$ = FALSE; }
  		;
  
  /*****************************************************************************
--- 6463,6484 ----
  			| ExecuteStmt					/* by default all are $$=$1 */
  		;
  
+ /*
+  * The precedence declaration for the opt_analyze EMPTY case, below, is
+  * necessary to prevent a shift/reduce conflict in the second production for
+  * ExplainStmt, above.  Otherwise, when the parser encounters "EXPLAIN (", it
+  * can't tell whether the "(" is the beginning of a SelectStmt or the beginning
+  * of the options list.  The precedence declaration below forces the latter
+  * interpretation.
+  *
+  * It might seem that we could get away with simply changing the definition of
+  * ExplainableStmt to use select_without_parens rather than SelectStmt, but
+  * that does not work, because select_without_parens produces expressions such
+  * as "(SELECT NULL) ORDER BY 1" that we interpret as legal queries.
+  */
  opt_analyze:
  			analyze_keyword			{ $$ = TRUE; }
! 			| /* EMPTY */			%prec UMINUS { $$ = FALSE; }	
  		;
  
  /*****************************************************************************
*** a/src/include/nodes/makefuncs.h
--- b/src/include/nodes/makefuncs.h
***************
*** 69,72 **** extern DefElem *makeDefElem(char *name, Node *arg);
--- 69,74 ----
  extern DefElem *makeDefElemExtended(char *namespace, char *name, Node *arg,
  									DefElemAction defaction);
  
+ extern ExplainStmt *makeExplain(List *options, Node *query);
+ 
  #endif   /* MAKEFUNC_H */

Pavel Stehule

pavel.stehule@gmail.com

over 16 years ago

In reply to: Robert Haas (#1)

Re: generic options for explain

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

regards
Pavel Stehule

2009/5/24 Robert Haas <robertmhaas@gmail.com>:

Show quoted text

Well, here we are! Yet another thread about some piece of information
that's omitted from EXPLAIN and can't easily be added because there
are a zillion things we want to add to EXPLAIN and it's not OK to bury
the user[1]! I've long been of the opinion that the right way to fix
this problem is to extend the syntax with some sort of extensible
options syntax[2]. The current "EXPLAIN [ANALYZE] [VERBOSE] <query>"
syntax does not scale to large numbers of options - it requires that
the options occur in a fixed order, and that the option names all be
keywords. Having gotten throughly fed up with having this
conversation for the ump-teenth time, I wrote a patch to introduce
just such a syntax. See attached.

What I did is borrowed the generic options stuff that Peter Eisentraut
introduced for FOREIGN DATA WRAPPER et. al, so you can write:

EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
e.g. EXPLAIN (ANALYZE "on") query

As written, this patch doesn't introduce any actual new functionality,
but I think it's pretty easy to see how we could build on the syntax
to add things like different types of output formats, different types
of instrumentation, etc. A few other random notes:

- This currently lacks documentation. If we have any consensus that
this is a reasonable approach, I'll add some.
- I noticed that we currently accept as a top-level SQL command an
arbitrarily parenthesized SELECT statement, like ((SELECT 3)). But
you can't put parentheses around any other type of statement. Even
more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
me makes no sense at all. But that's neither here nor there as far as
this patch is concerned, except that it required some minor grammar
hackery and a long comment explaining the hackery.

Thoughts?

...Robert

[1] http://archives.postgresql.org/message-id/4A16A8AF.2080508@anarazel.de
[2] http://archives.postgresql.org/message-id/603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 16 years ago

In reply to: Robert Haas (#1)

Re: generic options for explain

Hi Robert,

On 05/24/2009 02:47 AM, Robert Haas wrote:

Well, here we are! Yet another thread about some piece of information
that's omitted from EXPLAIN and can't easily be added because there
are a zillion things we want to add to EXPLAIN and it's not OK to bury
the user[1]! I've long been of the opinion that the right way to fix
this problem is to extend the syntax with some sort of extensible
options syntax[2]. The current "EXPLAIN [ANALYZE] [VERBOSE]<query>"
syntax does not scale to large numbers of options - it requires that
the options occur in a fixed order, and that the option names all be
keywords. Having gotten throughly fed up with having this
conversation for the ump-teenth time, I wrote a patch to introduce
just such a syntax. See attached.

What I did is borrowed the generic options stuff that Peter Eisentraut
introduced for FOREIGN DATA WRAPPER et. al, so you can write:

EXPLAIN (option_name1 "option_value1", option_name2 "option_value2") query
e.g. EXPLAIN (ANALYZE "on") query

Beeing the latest cause for the frustration leading to this patch I
obviously would like something like that - and I would gladly implement
some additional stats suggested by others(if implementable in a
reasonable timeframe) if this approach is agreed uppon.

- I noticed that we currently acce pt as a top-level SQL command an
arbitrarily parenthesized SELECT statement, like ((SELECT 3)). But
you can't put parentheses around any other type of statement. Even
more oddly, we also accept things like (SELECT 3) ORDER BY 1, which to
me makes no sense at all.

I would guess that stems from supporting syntax like:
(SELECT 1)
UNION
(SELECT 2)
ORDER BY

and not wanting to introduce a special path for
(SELECT 1)
ORDER BY

For additional stats to be kept another discussion about appropriate,
extensible representation suitable for different output formats probably
would be needed - but thats a discussion for another day.

Andres

Dave Page

dpage@pgadmin.org

over 16 years ago

In reply to: Pavel Stehule (#2)

Re: generic options for explain

On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

I agree. We shouldn't be overloading EXPLAIN with a new option every
time someone thinks of something new they'd like to see. XML EXPLAIN
output would obviously be extensible and machine readable. We could
easily produce a library of XSLT stylesheets to format the output in
different ways without cluttering the server with extra code. The
benefits to applications that want to read the output are also pretty
obvious.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Dave Page (#4)

Re: generic options for explain

On Sun, May 24, 2009 at 8:44 AM, Dave Page <dpage@pgadmin.org> wrote:

On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

I don't understand - why do we want to switch from a command interface
to a functional interface? I was thinking:

EXPLAIN (format 'xml') query...
EXPLAIN (format 'json') query...

What you're proposing will certainly be harder to code as well as more
different from what we have now. My approach has the advantage of
being able to leverage the main parser to parse both the query and the
options; with this approach, you'll need to pass the query text back
to the main parser and then write separate code of some sort to parse
the options. Seems like extra work for no gain.

The only possible advantage I can see of a functional interface is
that the current implementation of EXPLAIN is treated as a utility
command, so you can't do something like this:

INSERT INTO foo (EXPLAIN SELECT 1);

In my mind, however, fixing that would be preferable to (and probably
easier than) changing the whole syntax.

I agree. We shouldn't be overloading EXPLAIN with a new option every
time someone thinks of something new they'd like to see. XML EXPLAIN
output would obviously be extensible and machine readable. We could
easily produce a library of XSLT stylesheets to format the output in
different ways without cluttering the server with extra code. The
benefits to applications that want to read the output are also pretty
obvious.

Well, the major benefit of the existing EXPLAIN output is that you
don't need an XSLT stylesheet to read it. You can just be sitting
there in psql and do an EXPLAIN, and look at the results, and solve
your problem. As we add options to EXPLAIN, I would like to see us
support those in both the plaintext output and the XML output (and
hopefully the JSON output) using the same syntax.

If, for example, I need more information on what my hash joins are
doing, I'd like to be able to do:

EXPLAIN ('hash_detail', 'on') query...

...and have it tell me the # of buckets and batches for each hash join
and whatever other information might be useful in that context. I
*don't* want to have the solution to that problem be: run
explain_xml() on your query, then install an XSLT parser, then use
this pregenerated XSLT stylesheet to extract your data from the XML
document you got back from step #1, then view the resulting HTML in
your web browser. That may be a good solution for someone, but it
certainly doesn't sound convenient for anyone who works primarily at
the command line.

It's also worth noting that there are some kinds of statistics (for
example, I/O statistics) which can't be gathered without significantly
impacting the performance of the query. So you can't just take the
approach of turning on every possible instrumentation when XML output
is requested.

I do agree with your point that even with an extensible options
syntax, we can't just add an option for any old thing that someone
wants. We should try to come up with a somewhat comprehensive list of
types of instrumentation that someone might want and then group them
into categories, with one toggle per category.

...Robert

Pavel Stehule

pavel.stehule@gmail.com

over 16 years ago

In reply to: Robert Haas (#5)

Re: generic options for explain

2009/5/24 Robert Haas <robertmhaas@gmail.com>:

On Sun, May 24, 2009 at 8:44 AM, Dave Page <dpage@pgadmin.org> wrote:

On Sun, May 24, 2009 at 12:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

why we develop a new syntax?

we should have a secondary function explain_query(query_string,
option) that returns setof some. Next function should be
explain_query_xml. I thing so for typical use EXPLAIN statement is
enough. And for machine procession some new function should be
perfect.

I don't understand - why do we want to switch from a command interface
to a functional interface? I was thinking:

EXPLAIN (format 'xml') query...
EXPLAIN (format 'json') query...

Personally, I thing, so this format is strange - and is some MySQLism.
There are statements like SHOW TABLES and similar. My proposal doesn't
change of current EXPLAIN. The sence is don't more complicate current
syntax and do things simply. Function has more advantage then one:

a) don't need changes in parser
b) simply add new variants
c) simply add new parameters

What you're proposing will certainly be harder to code as well as more
different from what we have now. My approach has the advantage of
being able to leverage the main parser to parse both the query and the
options; with this approach, you'll need to pass the query text back
to the main parser and then write separate code of some sort to parse
the options. Seems like extra work for no gain.

The only possible advantage I can see of a functional interface is
that the current implementation of EXPLAIN is treated as a utility
command, so you can't do something like this:

INSERT INTO foo (EXPLAIN SELECT 1);

In my mind, however, fixing that would be preferable to (and probably
easier than) changing the whole syntax.

I agree. We shouldn't be overloading EXPLAIN with a new option every
time someone thinks of something new they'd like to see. XML EXPLAIN
output would obviously be extensible and machine readable. We could
easily produce a library of XSLT stylesheets to format the output in
different ways without cluttering the server with extra code. The
benefits to applications that want to read the output are also pretty
obvious.

Well, the major benefit of the existing EXPLAIN output is that you
don't need an XSLT stylesheet to read it. You can just be sitting
there in psql and do an EXPLAIN, and look at the results, and solve
your problem. As we add options to EXPLAIN, I would like to see us
support those in both the plaintext output and the XML output (and
hopefully the JSON output) using the same syntax.

If, for example, I need more information on what my hash joins are
doing, I'd like to be able to do:

EXPLAIN ('hash_detail', 'on') query...

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

Pavel

Show quoted text

...and have it tell me the # of buckets and batches for each hash join
and whatever other information might be useful in that context. I
*don't* want to have the solution to that problem be: run
explain_xml() on your query, then install an XSLT parser, then use
this pregenerated XSLT stylesheet to extract your data from the XML
document you got back from step #1, then view the resulting HTML in
your web browser. That may be a good solution for someone, but it
certainly doesn't sound convenient for anyone who works primarily at
the command line.

It's also worth noting that there are some kinds of statistics (for
example, I/O statistics) which can't be gathered without significantly
impacting the performance of the query. So you can't just take the
approach of turning on every possible instrumentation when XML output
is requested.

I do agree with your point that even with an extensible options
syntax, we can't just add an option for any old thing that someone
wants. We should try to come up with a somewhat comprehensive list of
types of instrumentation that someone might want and then group them
into categories, with one toggle per category.

...Robert

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Pavel Stehule (#6)

Re: generic options for explain

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more votes.

...Robert

Pavel Stehule

pavel.stehule@gmail.com

over 16 years ago

In reply to: Robert Haas (#7)

Re: generic options for explain

2009/5/24 Robert Haas <robertmhaas@gmail.com>:

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

I prefere little bit different syntax for named params like param =
value, or param => value, but syntax with AS is one confirmed.

See to me THAT is a really strange syntax, so I guess we need some more votes.

Show quoted text

...Robert

Andrew Dunstan

andrew@dunslane.net

over 16 years ago

In reply to: Robert Haas (#7)

Re: generic options for explain

Robert Haas wrote:

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more votes.

Both of these seem both odd an unnecessary. Why not just have a setting
called, say, explain_format which governs the output?

set explain_format = 'xml, verbose';
explain select * from foo;

No new function or syntax would be required.

cheers

andrew

#10

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Andrew Dunstan (#9)

Re: generic options for explain

On Sun, May 24, 2009 at 11:57 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).
select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more
votes.

Both of these seem both odd an unnecessary. Why not just have a setting
called, say, explain_format which governs the output?

set explain_format = 'xml, verbose';
explain select * from foo;

No new function or syntax would be required.

Well, then you have to issue two commands to do one thing. I don't
much like the practice of using GUCs to control behavior that you may
only want for the duration of one statement.

The fundamental problem here, at least as it seems to me, is that we
want more options for EXPLAIN, but the current syntax won't support
it, because it requires making everything a keyword (and fixing the
order). So we can either add enough punctuation to de-confuse the
parser (which is what I tried to do) or we can switch to a totally
different method of controlling EXPLAIN behavior (as you and Pavel are
advocating).

I wouldn't mind having a GUC to set the *default* explain behavior -
but I'd still like to be able to override it for a particular command
if I so choose. And that's not going to be possible with your syntax:
if explain_format is set to 'xml, verbose' and I want plain text
output for one command, how do I get it? Presumably I have to change
explain_format, run my EXPLAIN, and then change it back again. Blech!

My initial thought was to just use a list of key words for the
parameters, like this:

EXPLAIN (XML, VERBOSE) query...

...but I decided that wasn't a great idea, because it means that every
parameter has to be a boolean, which is probably more limiting than we
want to be. It also gets confusing because some parameters (like
"XML" and "JSON") are incompatible while others (like "HASH_DETAIL"
and "MERGE_DETAIL") can be used together, but there's not a lot of
clue in the syntax itself that this is the case. On the other hand,
if you write:

EXPLAIN (FORMAT 'XML', FORMAT 'JSON') ...
EXPLAIN (HASH_DETAIL 'ON', MERGE_DETAIL 'ON') ...

...it's obvious that the first case is qualitatively different from
the second one. I think this makes it easier to code the behavior in
a way that is consistent and correct, and also easier to document for
the benefit of our users. Now, we can do this part of it even if we
ultimately end up with a different overall syntax, for example:

explain_query(...query..., 'xml' as format, true as hash_detail)

I do suspect this will end up being a more complicated patch, and as I
said before, I don't really see the point.

As a point of historical interest, way back when (early 1997), we used
to support this syntax:

EXPLAIN WITH [COST|PLAN|FULL] query...

...but of course that's now impossible because WITH can be the first
word of a select query.

...Robert

#11

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Robert Haas (#10)

Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

I wouldn't mind having a GUC to set the *default* explain behavior -
but I'd still like to be able to override it for a particular command
if I so choose. And that's not going to be possible with your syntax:
if explain_format is set to 'xml, verbose' and I want plain text
output for one command, how do I get it? Presumably I have to change
explain_format, run my EXPLAIN, and then change it back again. Blech!

You know about SET LOCAL, no? I don't think this argument is very
convincing.

On the other side of the coin, I'm strongly against inventing more than
one new output format for EXPLAIN, and so any argument that depends on
examples such as "xml vs json" is falling on deaf ears here. I think
that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
seem untenable. What other options than those do you really need?
Not ones to add or remove output fields; we'd expect the client to
ignore fields it doesn't care about.

regards, tom lane

#12

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#11)

Re: generic options for explain

On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I wouldn't mind having a GUC to set the *default* explain behavior -
but I'd still like to be able to override it for a particular command
if I so choose. And that's not going to be possible with your syntax:
if explain_format is set to 'xml, verbose' and I want plain text
output for one command, how do I get it? Presumably I have to change
explain_format, run my EXPLAIN, and then change it back again. Blech!

You know about SET LOCAL, no? I don't think this argument is very
convincing.

I completely fail to see how that helps me. It's not faster or fewer
commands to type:

begin transaction; set local whatever; explain blah; commit transaction;

than it is to type:

set whatever; explain blah; set oldwhatever;

On the other side of the coin, I'm strongly against inventing more than
one new output format for EXPLAIN, and so any argument that depends on
examples such as "xml vs json" is falling on deaf ears here. I think
that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
seem untenable. What other options than those do you really need?
Not ones to add or remove output fields; we'd expect the client to
ignore fields it doesn't care about.

It's not just about me; we've had many requests for new EXPLAIN features.

Personally, I'd like to see the number of buckets and batches that a
hash join uses (predicted and actual), and maybe (with ANALYZE) the
number of hash collisions. I'd like to see memory utilization
statistics (peak memory utilization of hash join, for example). Other
people have requested I/O statistics (which you objected to on the
grounds that it would be too much overhead, so clearly if we're ever
going to do it it will have to be optional), and most recently number
of tuples discarded by the filter condition. We've also had requests
to suppress some information (like costs) for planner regression
testing.

I really don't see the point in restricting the syntax of EXPLAIN in
this way. I don't know exactly what sorts of useful options people
will come up with, but I do know that as long as we have an extremely
limiting options syntax, they can all be shot down on the grounds that
including them in the default output is too cumbersome for regular use
(which is absolutely true). On the other hand, I think it's
incredibly naive to suppose that EXPLAIN XML is going to make anyone
very happy. There are only two ways this can work out:

1. We'll make EXPLAIN XML output everything and the kitchen sink. In
this case, we'll soon get complaints along the lines of: "I can't use
regular EXPLAIN because it doesn't include the field that I need, but
the output of EXPLAIN XML is so voluminous that I can't read through
it by hand."
-or-
2. We'll be very restrictive about adding fields to EXPLAIN XML just
as we are now for plain EXPLAIN, in which case we haven't solved
anything.

I think XML output format is a complete distraction from the real
issue here, which is that there are certain pieces of information that
are sometimes useful but are not useful enough to justify including
them in the EXPLAIN output 100% of the time. By just punting all that
stuff to EXPLAIN XML, we're just saying that we're not interested in
creating a workable set of options to allow users to pick and choose
the information they care about - so instead we're going to dump a
huge chunk of unreadable XML and then make it the user's problem to
find a tool that will extract the details that they care about. Boo,
hiss.

Anyway, I'm suprised by the reaction to this patch, but I'll drop it.
I would like to make the EXPLAIN syntax more powerful for command-line
use, and I'd implement XML format and JSON along the way just for
completeness. But I don't have much interest in creating an XML
output format that is the ONLY way of getting more information,
because I'm a command-line user and it does me no good at all. :-(

...Robert

#13

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Robert Haas (#12)

Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You know about SET LOCAL, no? I don't think this argument is very
convincing.

I completely fail to see how that helps me.

Mainly, what it does is avoid having to know exactly what the old
setting was.

regards, tom lane

#14

Joshua Tolley

eggyknap@gmail.com

over 16 years ago

In reply to: Andrew Dunstan (#9)

Re: generic options for explain

On Sun, May 24, 2009 at 11:57:13AM -0400, Andrew Dunstan wrote:

Robert Haas wrote:

EXPLAIN ('hash_detail', 'on') query...

Oops, I should have written EXPLAIN (hash_detail 'on') query... can't
follow my own syntax.

I am sorry - this is really strange syntax . Who will use this syntax?
For some parser is little bit better function call, than parametrized
statement. Some dificulties with options should be fixed with named
param (we are speaking about 8.5).

select explain_xml("select ...", true as hash_detail, ...)

See to me THAT is a really strange syntax, so I guess we need some more votes.

Both of these seem both odd an unnecessary. Why not just have a setting
called, say, explain_format which governs the output?

set explain_format = 'xml, verbose';
explain select * from foo;

No new function or syntax would be required.

A further possibility: Oracle's equivalent of EXPLAIN doesn't actually output
anything to the screen, but rather fills in a (temporary?) table somewhere with
details of the query plan. I mostly found this irritating when working with
Oracle, because each time I used it I had to look up an example query to
generate output like PostgreSQL's EXPLAIN, which is generally what I really
wanted. But since we'd still have the old EXPLAIN behavior available, perhaps
something such as an Oracle-like table filler would be useful.

Such a proposal doesn't answer the need to allow users to specify, for
performance and other reasons, the precise subset of statistics they're
interested in; for whatever it's worth, my current favorite contender in that
field is EXPLAIN (a, b, c) <query>.

- Josh / eggyknap

#15

Greg Smith

gsmith@gregsmith.com

over 16 years ago

In reply to: Pavel Stehule (#2)

Re: generic options for explain

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#16

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Greg Smith (#15)

Re: generic options for explain

Greg Smith <gsmith@gregsmith.com> writes:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are. So from a strict point of view
this only makes sense when the output format is designed to not depend
on row ordering to convey information. We could certainly invent such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.

regards, tom lane

#17

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#13)

Re: generic options for explain

On Sun, May 24, 2009 at 4:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You know about SET LOCAL, no? I don't think this argument is very
convincing.

I completely fail to see how that helps me.

Mainly, what it does is avoid having to know exactly what the old
setting was.

Ah, OK, I see what you're going for. That's not really what I'm
complaining about with that syntax, though....

...Robert

#18

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#16)

Re: generic options for explain

On Sun, May 24, 2009 at 6:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Smith <gsmith@gregsmith.com> writes:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are. So from a strict point of view
this only makes sense when the output format is designed to not depend
on row ordering to convey information. We could certainly invent such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.

The current output format basically prints out the node type and then
a collection of properties that are associated with that node, where
applicable: join type, scan direction, strategy, relation, alias,
startup cost, total cost, rows, width, loops, filter, join filter,
various types of condition (index/recheck/tid/merge/hash), sort key.
However, we tend to omit certain fields (particularly scan direction
and alias) when they contain information that isn't sufficiently
interesting. That's probably not a good idea for anything that's
intended for machine-parsing; I think for XML or JSON or
output-to-a-table we'd want to include those fields whenever possible.

With that minor complaint (and the difficulty of figuring out how to
avoid duplicating large portions of explain.c), XML or JSON output
doesn't seem that difficult. In JSON there aren't too many ways of
doing this that make sense. I would guess we'd want something like
this:

{ "node" : "Hash Join", "jointype" : "Inner", "startup_cost" :
"11.49", "total_cost" : "92.59", "rows" : "1877", "width" : "325",
'outer' : { "node" : "Seq Scan", <more stuff> }
'inner' : { "node" : "Hash", <more stuff> }
}

XML, being XML, has 10 ways of doing something when 1 is more than
sufficient, so there are a couple of ways we could go. I think this
is my favorite.

or you could just use one node type:

...and then there's this style:

<HashJoin>
<jointype>inner</jointype>
<startup_cost>11.49</startup_cost>
...
<outer/>
...
</outer>
<inner>
...
</inner>
</HashJoin>

...which is incredibly long and redundant, but some people who use XML
like such things. I'm sure there are other possibilities as well.

With respect to table output things are a little bit less
straightforward. There are two issues. With XML or JSON, any
properties that aren't relevant to a particular node can be omitted
altogether, whereas for a table the column list has to be consistent
throughout. We can just leave the unused columns as NULL, but it will
be a fairly wide table. Also, with apologies to Josh Berkus, we'll
need to introduce some sort of non-natural primary key to allow
children to be related to parents, because unlike XML and JSON,
there's no built-in way to make one object the child of another.

All that having been said, making EXPLAIN into a query (so that you
can do INSERT INTO foo (EXPLAIN SELECT query...) might be useful to
some people even without changing the output format at all. I think
you could throw a windowing function in there to at least tag each
line with its original position in the output, and some people might
want just that.

...Robert

#19

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Greg Smith (#15)

Re: generic options for explain

On Sun, May 24, 2009 at 6:05 PM, Greg Smith <gsmith@gregsmith.com> wrote:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look to
be extremely useful, maybe at that point it's worth talking about how to
integrate them into the parser. Starting with the parser changes rather
than the parts that actually do the work is backwards. If you do it the
other way around, at all times you have a patch that actually provides
immediate useful value were it to be committed.

Well, perhaps I ought to be asking what sort of features people would
like to see, other than variant output formats? Maybe if we can
develop some kind of wish list for EXPLAIN, it will become more
obvious what the best option syntax is.

...Robert

#20

Greg Smith

gsmith@gregsmith.com

over 16 years ago

In reply to: Tom Lane (#16)

Re: generic options for explain

On Sun, 24 May 2009, Tom Lane wrote:

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are.

Fair enough; I think Pavel and myself were presuming an implied "line
number" in the output there that, as you point out, doesn't actually
exist. The actual implementation detail there doesn't really change the
spirit of what I was trying to suggest though: that this feature should
get designed starting with the expected output and how to generate it,
then work backwards towards how you pass it parameters.

On that topic, I though Tom Raney was the latest to update code for the
XML output at least. Code and a presentation going over everything is at
http://web.cecs.pdx.edu/~raneyt/gsoc/ and there's a video at
http://www.postgresqlconference.org/2008/west/talks/

A note about that got sent to this list at one point but I don't see any
follow-up:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00802.php

Anybody have a better idea of what happened with that project than me?

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#21

Joshua Tolley

eggyknap@gmail.com

over 16 years ago

In reply to: Tom Lane (#16)

Re: generic options for explain

On Sun, May 24, 2009 at 06:53:29PM -0400, Tom Lane wrote:

Greg Smith <gsmith@gregsmith.com> writes:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are. So from a strict point of view
this only makes sense when the output format is designed to not depend
on row ordering to convey information. We could certainly invent such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.

The Oracle version, as it fills the table of explain results, gives each number
an id and the id of its parent row, which behavior we could presumably copy.
I'm definitely keen to keep a human-readable EXPLAIN such as we have now, to
augment the table-based proposal, but a table would provide the more flexible
output we'd need for more detailed reporting, a simple interface for
applications to consume the EXPLAIN data without human intervention, and a
convenient platform from whence the data can be transformed to XML, JSON, etc.
for those that are so inclined.

- Josh / eggyknap

#22

Pavel Stehule

pavel.stehule@gmail.com

over 16 years ago

In reply to: Tom Lane (#16)

Re: generic options for explain

2009/5/25 Tom Lane <tgl@sss.pgh.pa.us>:

Greg Smith <gsmith@gregsmith.com> writes:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding functions that
actually do the work required. Then, if there's a set of those that look
to be extremely useful, maybe at that point it's worth talking about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are. So from a strict point of view
this only makes sense when the output format is designed to not depend
on row ordering to convey information. We could certainly invent such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.

I don't expect so functional EXPLAIN will be used by users directly.
It' data source for some "GUI". And currently with CTE, there are not
problem transform query to similar output like current EXPLAIN. I am
able to understand some new parameters for explain statement (when
result will be directly read by user), but some output options (or
formating options) I would to see in some other functions.

regards
Pavel Stehule

Show quoted text

regards, tom lane

#23

Dimitri Fontaine

dfontaine@hi-media.com

over 16 years ago

In reply to: Robert Haas (#12)

Re: generic options for explain

Hi,

After having read all the followups I already received, I prefer to
answer to this particular message.

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, May 24, 2009 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

On the other side of the coin, I'm strongly against inventing more than
one new output format for EXPLAIN, and so any argument that depends on
examples such as "xml vs json" is falling on deaf ears here. I think
that we only need an XML option, and so EXPLAIN ANALYZE XML ... doesn't
seem untenable. What other options than those do you really need?
Not ones to add or remove output fields; we'd expect the client to
ignore fields it doesn't care about.

It's not just about me; we've had many requests for new EXPLAIN
features.

[...]

I think XML output format is a complete distraction from the real
issue here, which is that there are certain pieces of information that
are sometimes useful but are not useful enough to justify including
them in the EXPLAIN output 100% of the time.

I think the summary here is to say that we want two modes of operations:
- the current one, which continues to get refinements

- a new one conveying all possible information in machine readable
formats, possibly with some tools to handle it easily: XML output and
maybe XSLT stylesheets

Anyway, I'm suprised by the reaction to this patch, but I'll drop it.
I would like to make the EXPLAIN syntax more powerful for command-line
use, and I'd implement XML format and JSON along the way just for
completeness. But I don't have much interest in creating an XML
output format that is the ONLY way of getting more information,
because I'm a command-line user and it does me no good at all. :-(

That's only because you seem to be thinking that having core PostgreSQL
do the first half of the work means you as a user will have to do the
second part. I assume pgadmin and phppgadmin developers will offer their
users some graphical approach to the output reading, with dynamic
filtering, eg.

I don't see anything stopping you to provide a simple way to have the
same facility into psql. You can already have the query output filtered
by any script you want this way:
=# \o |my_presentation_script <style name> | <stylesheet full path>
=# explain XML ...
=# \o

Now, we talked about a better filtering integration into psql more than
once in the past, so there's a path for you to have both complete
EXPLAIN and command line tool suite integration, I'd say.

Oh and by the way, even if I don't like XML that much, this battled is
lost already. The way I understand it, -core will accept only two
EXPLAIN outputs: fit on screen and complete. The second one will have to
be easy for machine consumption, and so use a format anyone can work
with. The choice has been made, it's XML.

If you want to develop analysis tools which will parse JSON rather than
XML, nothing stops you writing a stylesheet to provide JSON output from
XML. Of course then you want this to happen server-side, so you want the
EXPLAIN output to be processed from within the database, all the more if
it's XML and we have the ability to apply XSLT to XML content already in
the server.

But being able to apply SQL level function calls to EXPLAIN output seems
a different subject alltogether...

Regards,
--
dim

#24

Noname

tomas@tuxteam.de

over 16 years ago

In reply to: Dimitri Fontaine (#23)

Re: generic options for explain

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[Sent by mistake to Robert Haas only at first try. No cure for fat
fingers, I guess]

On Sun, May 24, 2009 at 04:05:18PM -0400, Robert Haas wrote:

[...]

I think XML output format is a complete distraction from the real
issue here, which is that there are certain pieces of information that
are sometimes useful but are not useful enough to justify including
them in the EXPLAIN output 100% of the time. By just punting all that
stuff to EXPLAIN XML, we're just saying that we're not interested in
creating a workable set of options to allow users to pick and choose
the information they care about - so instead we're going to dump a
huge chunk of unreadable XML and then make it the user's problem to
find a tool that will extract the details that they care about. Boo,
hiss.

In my experience, this happens with "XML the data description language"
many times, but I haven' seen the problem as well-stated as in your
mail.

Hard for humans to read, hard for machines to read, and often trying to
solve a problem it can't (in this case, selecting the needed information
_beforehand_).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKGntjBcgs9XrR2kYRAk1/AJ4rnZFnU4PFM8AJkaYYYLRInYHJDQCbBbt2
lTwxydBBnXP1MgDxz+vcpM8=
=o2qW
-----END PGP SIGNATURE-----

Import Notes

Reply to msg id not found: 20090525042144.GA27707@tomas | Resolved by subject fallback

#25

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Dimitri Fontaine (#23)

Re: generic options for explain

On Mon, May 25, 2009 at 6:24 AM, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:

I think the summary here is to say that we want two modes of operations:
- the current one, which continues to get refinements

- a new one conveying all possible information in machine readable
formats, possibly with some tools to handle it easily: XML output and
maybe XSLT stylesheets

I don't agree with that summary. Many people who responded to this
thread were fine with the idea of some sort of options syntax, but we
had at least four different proposals for how to implement it:

Robert Haas: EXPLAIN (foo 'bar', baz 'bletch', ...) query
Pavel Stehule: explain_query(query, options...) [exact format of
options not specified]
Andrew Dunstan: SET explain_format = 'foo, baz'; EXPLAIN query
Josh Tolley: EXPLAIN (foo, baz, ...) query [also suggested by me as an
idea I rejected]

Tom Lane was the only person to suggest that we only ever need one
more option to EXPLAIN and that it should be called XML. Even though
I prefer my format to the other options suggested (which I would
probably rank in order of descending preference Josh-Andrew-Pavel), I
am actually someone encouraged that we might have some kind of fragile
consensus that an extensible options syntax is useful (a point that
Andres Freund and Greg Smith also seemed to agree with).

Anyway, I'm suprised by the reaction to this patch, but I'll drop it.
I would like to make the EXPLAIN syntax more powerful for command-line
use, and I'd implement XML format and JSON along the way just for
completeness. But I don't have much interest in creating an XML
output format that is the ONLY way of getting more information,
because I'm a command-line user and it does me no good at all. :-(

That's only because you seem to be thinking that having core PostgreSQL
do the first half of the work means you as a user will have to do the
second part. I assume pgadmin and phppgadmin developers will offer their
users some graphical approach to the output reading, with dynamic
filtering, eg.

I don't see anything stopping you to provide a simple way to have the
same facility into psql. You can already have the query output filtered
by any script you want this way:
=# \o |my_presentation_script <style name> | <stylesheet full path>
=# explain XML ...
=# \o

This is all much more complicated than what I proposed, and I fail to
see what it buys us. I'd say that you're just reinforcing the point I
made upthread, which is that insisting that XML is the only way to get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

...Robert

#26

Joshua Tolley

eggyknap@gmail.com

over 16 years ago

In reply to: Robert Haas (#25)

Re: generic options for explain

On Mon, May 25, 2009 at 07:14:56AM -0400, Robert Haas wrote:

Many people who responded to this
thread were fine with the idea of some sort of options syntax, but we
had at least four different proposals for how to implement it:

Robert Haas: EXPLAIN (foo 'bar', baz 'bletch', ...) query
Pavel Stehule: explain_query(query, options...) [exact format of
options not specified]
Andrew Dunstan: SET explain_format = 'foo, baz'; EXPLAIN query
Josh Tolley: EXPLAIN (foo, baz, ...) query [also suggested by me as an
idea I rejected]

I hadn't actually caught that there were two ideas on the table with syntax
similar to "EXPLAIN (...) <query>", and don't mean to champion either of the
two specifically (at least until I've read closely enough to note the
difference). I just kinda liked the "EXPLAIN (some options of some sort)
<query>" syntax better than other proposals.

That said, I think I'm changing my vote in favor of Pavel. It's my guess that
some variant of his version would be the easiest to make compliant with the
bit I'm most interested in, which is not being limited to, say,
XML/JSON/YAML/etc. in the output. Applications that use PostgreSQL will of
necessity need to know how to handle data presented in tables, so let's
present our explain results as a table. As has been said in other branches of
this thread, that way we don't force applications also to support
XML/JSON/YAML/etc. We might consider providing functions to convert the
tabular result to one or more of those formats, but at its inception, the data
should live as tuples in a relation.

In other messages, I've advocated actually inserting the data into a table. I
think that was a mistake. Who makes the table? What's it called? What schema
is it in? Who cleans it up when we're done with it? ...etc. I'd much rather
see a bunch of rows returned as a set, which I can then insert into a table,
pass into a function for reformatting, or just consume in an application.

All of which leads me to this variant of the functional approach as my answer:

SELECT * FROM pg_explain_query("<query>", <options in a still-unspecified
format>);

I could then do things like this:

CREATE TABLE explain_results AS SELECT * FROM pg_explain_query(...);

and this:

SELECT xmlify_a_record(pg_explain_query(...));

- Josh / eggyknap

#27

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Joshua Tolley (#21)

Re: generic options for explain

Joshua Tolley <eggyknap@gmail.com> writes:

The Oracle version, as it fills the table of explain results, gives
each number an id and the id of its parent row, which behavior we
could presumably copy. I'm definitely keen to keep a human-readable
EXPLAIN such as we have now, to augment the table-based proposal, but
a table would provide the more flexible output we'd need for more
detailed reporting, a simple interface for applications to consume the
EXPLAIN data without human intervention, and a convenient platform
from whence the data can be transformed to XML, JSON, etc. for those
that are so inclined.

I would think a table would be considerably *less* flexible --- you
could not easily change the output column set. Unless you're imagining
just dumping something equivalent to the current output into a text
column. Which would be flexible, but it'd hardly have any of the
other desirable properties you list.

regards, tom lane

#28

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Robert Haas (#25)

Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I fail to
see what it buys us. I'd say that you're just reinforcing the point I
made upthread, which is that insisting that XML is the only way to get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

The impression I have is that (to misquote Churchill) XML is the worst
option available, except for all the others. We need something that can
represent a fairly complex data structure, easily supports addition or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't have to
write your own parser, in most languages". How many realistic
alternatives are there?

regards, tom lane

#29

Joshua Tolley

eggyknap@gmail.com

over 16 years ago

In reply to: Tom Lane (#27)

Re: generic options for explain

On Mon, May 25, 2009 at 10:55:48AM -0400, Tom Lane wrote:

Joshua Tolley <eggyknap@gmail.com> writes:

The Oracle version, as it fills the table of explain results, gives
each number an id and the id of its parent row, which behavior we
could presumably copy. I'm definitely keen to keep a human-readable
EXPLAIN such as we have now, to augment the table-based proposal, but
a table would provide the more flexible output we'd need for more
detailed reporting, a simple interface for applications to consume the
EXPLAIN data without human intervention, and a convenient platform
from whence the data can be transformed to XML, JSON, etc. for those
that are so inclined.

I would think a table would be considerably *less* flexible --- you
could not easily change the output column set. Unless you're imagining
just dumping something equivalent to the current output into a text
column. Which would be flexible, but it'd hardly have any of the
other desirable properties you list.

I'm not sure I see why it would be less flexible. I'm imagining we define some
record type, and a function that returns a set of those records. The fields in
the record would include data element this version of explain could possibly
return. Then you call a function to explain a query, passing it some options
to select which of those data elements you're interested in. The function
returns the same data type at each call, filling with NULLs the fields you've
told it you're uninterested in. Changes between versions would modify this
data type, but provided consumer applications have specified the columns
they're interested in (rather than, say, SELECT *) that shouldn't bother
anyone.

The functions to change this into some other format would probably need to be
more intelligent than just that. That seems a fair price to pay.

- Josh / eggyknap

#30

Massa, Harald Armin

chef@ghum.de

over 16 years ago

In reply to: Tom Lane (#28)

Re: generic options for explain

The impression I have is that (to misquote Churchill) XML is the worst
option available, except for all the others. We need something that can
represent a fairly complex data structure, easily supports addition or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't have to
write your own parser, in most languages". How many realistic
alternatives are there?

One realistic alternative may be JSON: parsers for nearly all languages are

freely available; everything web-affected speeks it perfectly; it's easier
on the eye then XML, less bytes per information, addition and removal of
fields as well as complex structures are possible.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
LASIK good, steroids bad?

#31

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Joshua Tolley (#29)

Re: generic options for explain

Joshua Tolley <eggyknap@gmail.com> writes:

I'm not sure I see why it would be less flexible. I'm imagining we define some
record type, and a function that returns a set of those records.

I'm unimpressed by the various proposals to change EXPLAIN into a
function. Quoting the command-to-explain is going to be a pain in the
neck. And can you really imagine using it manually, especially if it
returns so many fields that you *have to* write out the list of fields
you actually want, else the result is unreadable? It's going to be just
as much of something you can only use through a helper application as
the XML way would be.

regards, tom lane

#32

David Fetter

david@fetter.org

over 16 years ago

In reply to: Tom Lane (#28)

Re: generic options for explain

On Mon, May 25, 2009 at 11:02:53AM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I fail
to see what it buys us. I'd say that you're just reinforcing the
point I made upthread, which is that insisting that XML is the
only way to get more detailed information will just create a
cottage industry of beating that XML output format into
submission.

The impression I have is that (to misquote Churchill) XML is the
worst option available, except for all the others. We need
something that can represent a fairly complex data structure, easily
supports addition or removal of particular fields in the structure
(including fields not foreseen in the original design), is not hard
for programs to parse, and is widely supported --- ie, "not hard"
includes "you don't have to write your own parser, in most
languages". How many realistic alternatives are there?

JSON for one, and it's *much* lighter in just about every way.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#33

Tom Raney

twraney@comcast.net

over 16 years ago

In reply to: David Fetter (#32)

Re: generic options for explain

David Fetter wrote:

On Mon, May 25, 2009 at 11:02:53AM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I fail
to see what it buys us. I'd say that you're just reinforcing the
point I made upthread, which is that insisting that XML is the
only way to get more detailed information will just create a
cottage industry of beating that XML output format into
submission.

The impression I have is that (to misquote Churchill) XML is the
worst option available, except for all the others. We need
something that can represent a fairly complex data structure, easily
supports addition or removal of particular fields in the structure
(including fields not foreseen in the original design), is not hard
for programs to parse, and is widely supported --- ie, "not hard"
includes "you don't have to write your own parser, in most
languages". How many realistic alternatives are there?

JSON for one, and it's *much* lighter in just about every way.

Cheers,
David.

For what it's worth, if this revised output form is destined for
consumption by a machine, it really doesn't matter what protocol is used
and how 'readable' it is by humans, as long as the protocol can express
all the present and anticipated variations of the data without breaking
parsers along the way.

While building the Visual Planner tool, I selected XML output for no
other reason than it was easy to parse on the receiving end and was
hierarchical, making it perfect for representing a plan tree - or
thousands. I'm sure other alternatives would have been fine as well.
But, once that decision was made, I never had any reason again to look
at the XML stream.

If we're worried about the excess 'weight' of XML, I found this to be a
non-issue in practice. The output generated by the
Visual-Planner-Enabled Postgres server contains MUCH more information
that one would typically see with standard EXPLAIN. The tool returns
not only the most-optimal plan, but all discarded plans as well. A four
way join results in output of 24k lines of XML. While it parses nearly
instantly, the biggest delay is in the network. And, even this is minimal.

So, why not put ALL interesting data in the EXPLAIN XML feed? I'm not
suggesting for this discussion that we include discarded plans, but that
we include every piece of data that may be of interest to folks building
connecting tools. The parsers can pick and choose what they use easily
and, because the feed isn't positional, won't break when addition data
is added. A GUC parameter could govern the data included in this
variant of EXPLAIN, but even that seems unnecessary. This approach will
allow the standard EXPLAIN to evolve in whatever way pleases the humans
without interfering with the machines.

Regards,

Tom Raney

#34

Michael Glaesemann

grzm@seespotcode.net

over 16 years ago

In reply to: Joshua Tolley (#21)

Re: generic options for explain

On May 25, 2009, at 0:47 , Joshua Tolley wrote:

On Sun, May 24, 2009 at 06:53:29PM -0400, Tom Lane wrote:

Greg Smith <gsmith@gregsmith.com> writes:

On Sun, 24 May 2009, Pavel Stehule wrote:

we should have a secondary function explain_query(query_string,
option) that returns setof some.

+1. The incremental approach here should first be adding
functions that
actually do the work required. Then, if there's a set of those
that look
to be extremely useful, maybe at that point it's worth talking
about how
to integrate them into the parser. Starting with the parser changes
rather than the parts that actually do the work is backwards. If
you do
it the other way around, at all times you have a patch that actually
provides immediate useful value were it to be committed.

Something that returns a setof can also be easily used to
implement the
"dump EXPLAIN to a table" feature Josh Tolley brought up (which is
another
common request in this area).

A serious problem with EXPLAIN via a function returning set, or with
putting the result into a table, is that set results are logically
unordered, just as table contents are. So from a strict point of
view
this only makes sense when the output format is designed to not
depend
on row ordering to convey information. We could certainly invent
such
a format, but I think it's a mistake to go in this direction for
EXPLAIN output that is similar to the current output.

The Oracle version, as it fills the table of explain results, gives
each number
an id and the id of its parent row, which behavior we could
presumably copy.

Or some other schema that allows us to preserve the tree.

Michael Glaesemann
grzm seespotcode net

#35

Joshua Tolley

eggyknap@gmail.com

over 16 years ago

In reply to: Tom Lane (#31)

Re: generic options for explain

On Mon, May 25, 2009 at 11:22:24AM -0400, Tom Lane wrote:

Joshua Tolley <eggyknap@gmail.com> writes:

I'm not sure I see why it would be less flexible. I'm imagining we define some
record type, and a function that returns a set of those records.

I'm unimpressed by the various proposals to change EXPLAIN into a
function. Quoting the command-to-explain is going to be a pain in the
neck.

Yeah, that's been bugging me, despite my recent support of that plan.

And can you really imagine using it manually, especially if it
returns so many fields that you *have to* write out the list of fields
you actually want, else the result is unreadable? It's going to be just
as much of something you can only use through a helper application as
the XML way would be.

Good point. The reason, as I remember it, that we wanted to be able to specify
what fields are returned was so that fields that are expensive to calculate
are calculated only when the user wants them. If that's the only
consideration, perhaps we should have a standard version and a "FULL" version,
e.g.

EXPLAIN [ANALYZE] [FULL] <query>

...where FULL would indicate the user wanted the all available statistics, not
just the cheap ones. Somewhere in there we'd also need an indicator to say we
wanted an output format other than the usual text version (such as the "WITH
XML" clause I think someone suggested); I maintain it's all just a table of
data, and should be represented the same way we represent any other table of
data.

- Josh / eggyknap

#36

Andres Freund

andres@anarazel.de

over 16 years ago

In reply to: Tom Raney (#33)

Re: generic options for explain

Hi Tom,

On 05/25/2009 08:04 PM, Tom Raney wrote:

So, why not put ALL interesting data in the EXPLAIN XML feed? I'm not
suggesting for this discussion that we include discarded plans, but that
we include every piece of data that may be of interest to folks building
connecting tools. The parsers can pick and choose what they use easily
and, because the feed isn't positional, won't break when addition data
is added. A GUC parameter could govern the data included in this variant
of EXPLAIN, but even that seems unnecessary. This approach will allow
the standard EXPLAIN to evolve in whatever way pleases the humans
without interfering with the machines.

Well, there is the problem Robert Haas described - some stats may be too
expensive to gather (like the io-stats) for regular use, but still be
quite usefull.

Andres

#37

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#31)

Re: generic options for explain

On Mon, May 25, 2009 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joshua Tolley <eggyknap@gmail.com> writes:

I'm not sure I see why it would be less flexible. I'm imagining we define some
record type, and a function that returns a set of those records.

I'm unimpressed by the various proposals to change EXPLAIN into a
function. Quoting the command-to-explain is going to be a pain in the
neck.

I agree with this, but there is a lot of sentiment (which I share)
that it should be possible to capture EXPLAIN output using subselect
or CTAS syntax, regardless of exactly what that output ends up being.
That seems to require that EXPLAIN be a fully-reserved keyword, so I
wonder what we think about that.

(The way I tested this quickly is to make '(' ExplainStmt ')' a third
production for select_with_parens. I'm not 100% sure that's the right
place for it, but a couple of other reasonable-looking places produced
non-obvious parsing conflicts.)

And can you really imagine using it manually, especially if it
returns so many fields that you *have to* write out the list of fields
you actually want, else the result is unreadable? It's going to be just
as much of something you can only use through a helper application as
the XML way would be.

Nothing could possibly be as bad as XML. I'm with Josh: if we produce
table-formatted output, someone can always turn it into XML or JSON or
whatever they want. The reverse figures to be a whole lot more
difficult.

...Robert

#38

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Robert Haas (#37)

Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

I agree with this, but there is a lot of sentiment (which I share)
that it should be possible to capture EXPLAIN output using subselect
or CTAS syntax, regardless of exactly what that output ends up being.

Well, it should be possible to capture the output, but let's not
prejudge the syntax.

That seems to require that EXPLAIN be a fully-reserved keyword, so I
wonder what we think about that.

Nonstarter, especially when it's so obviously possible to do it without
that. The threshold for reserving words that aren't reserved by SQL
spec has to be really high, because you will break innocent applications
that way.

Before anyone gets overly excited about having special syntax for this,
I should point out that you can do it today, for instance like so:

regression=# create function expl(q text) returns setof text as $$
regression$# declare r record;
regression$# begin
regression$# for r in execute 'explain ' || q loop
regression$# return next r."QUERY PLAN";
regression$# end loop;
regression$# end$$ language plpgsql;
CREATE FUNCTION

regression=# select * from expl('select * from tenk1');
expl
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
(1 row)

Admittedly this is a bit inconvenient, but the point is that the
functionality does exist. There is no need to have a built-in
version of this function unless we get significant advantages
from having it built-in, and right now I'm not seeing those.

regards, tom lane

#39

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#38)

Re: generic options for explain

On Mon, May 25, 2009 at 6:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I agree with this, but there is a lot of sentiment (which I share)
that it should be possible to capture EXPLAIN output using subselect
or CTAS syntax, regardless of exactly what that output ends up being.

Well, it should be possible to capture the output, but let's not
prejudge the syntax.

That seems to require that EXPLAIN be a fully-reserved keyword, so I
wonder what we think about that.

Nonstarter, especially when it's so obviously possible to do it without
that. The threshold for reserving words that aren't reserved by SQL
spec has to be really high, because you will break innocent applications
that way.

Before anyone gets overly excited about having special syntax for this,
I should point out that you can do it today, for instance like so:

regression=# create function expl(q text) returns setof text as $$
regression$# declare r record;
regression$# begin
regression$# for r in execute 'explain ' || q loop
regression$# return next r."QUERY PLAN";
regression$# end loop;
regression$# end$$ language plpgsql;
CREATE FUNCTION

regression=# select * from expl('select * from tenk1');
expl
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
(1 row)

Admittedly this is a bit inconvenient, but the point is that the
functionality does exist. There is no need to have a built-in
version of this function unless we get significant advantages
from having it built-in, and right now I'm not seeing those.

The only problem I see with this is that there's no convenient way of
specifying the options you want. Granted, it wouldn't be all that
difficult to add a couple of boolean options to specify the state of
EXPLAIN and ANALYZE, but if we have more options, it starts to get a
bit complicated, especially if they do things like change the set of
output columns. I'm still liking that idea, but even if we don't end
up implementing that particular thing, I'm really doubtful that the
need to make EXPLAIN do more things is likely to go away.

...Robert

#40

Greg Stark

stark@enterprisedb.com

over 16 years ago

In reply to: Tom Lane (#38)

Re: generic options for explain

On Mon, May 25, 2009 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Admittedly this is a bit inconvenient, but the point is that the
functionality does exist. There is no need to have a built-in
version of this function unless we get significant advantages
from having it built-in, and right now I'm not seeing those.

I assume people don't want the *text* of the current output format but
the actual values in separate columns. So you could do things like
accumulate the data in a table and later use sql to search for queries
using specific indexes or where estimates are off etc.

--
greg

#41

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Greg Stark (#40)

Re: generic options for explain

Greg Stark <stark@enterprisedb.com> writes:

On Mon, May 25, 2009 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Admittedly this is a bit inconvenient, but the point is that the
functionality does exist. �There is no need to have a built-in
version of this function unless we get significant advantages
from having it built-in, and right now I'm not seeing those.

I assume people don't want the *text* of the current output format but
the actual values in separate columns.

Well, I notice that everyone is carefully dodging the subject of exactly
what columns they want, but my example would clearly scale easily to any
specific set of output columns that EXPLAIN might return instead of one
text column. Since we were previously told that any particular release
of PG need only offer one set of possible output columns, I figured the
problem was solved ;-)

regards, tom lane

#42

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#41)

Re: generic options for explain

On Mon, May 25, 2009 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <stark@enterprisedb.com> writes:

On Mon, May 25, 2009 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Admittedly this is a bit inconvenient, but the point is that the
functionality does exist. There is no need to have a built-in
version of this function unless we get significant advantages
from having it built-in, and right now I'm not seeing those.

I assume people don't want the *text* of the current output format but
the actual values in separate columns.

Well, I notice that everyone is carefully dodging the subject of exactly
what columns they want,

I had a try at this upthread, actually, but it's not a real easy problem.

but my example would clearly scale easily to any
specific set of output columns that EXPLAIN might return instead of one
text column. Since we were previously told that any particular release
of PG need only offer one set of possible output columns, I figured the
problem was solved ;-)

I was totally unconvinced by that argument.

I actually think that the best data structure for this would be
something like hstore. It would sure be nice to be able to manipulate
this data using SQL: I am sure there are people on this mailing list
who hate XML and maybe a few who hate JSON, but if they hate SQL then
they're off my list of people I care about making happy. :-) At the
same time, the variable number of output columns is problematic for a
flat table representation. It may not be so problematic that we can't
work around it, but it's definitely not great.

It's really the pits to think that our data model is so impoverished
that it can't in a reasonable way handle the output of our EXPLAIN
command. It would be awfully sweet to be able to do things like: show
me all the plan nodes where the expected and actual row counts
differed by more than a factor of 10. And why should I need an
external XML/JSON parser to do that, rather than just a WHERE clause?

...Robert

#43

Peter Eisentraut

peter_e@gmx.net

over 16 years ago

In reply to: Tom Lane (#28)

Re: generic options for explain

On Monday 25 May 2009 18:02:53 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I fail to
see what it buys us. I'd say that you're just reinforcing the point I
made upthread, which is that insisting that XML is the only way to get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

The impression I have is that (to misquote Churchill) XML is the worst
option available, except for all the others. We need something that can
represent a fairly complex data structure, easily supports addition or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't have to
write your own parser, in most languages". How many realistic
alternatives are there?

I think we are going in the wrong direction. No one has said that they want a
machine-readable EXPLAIN format. OK, there are historically about three
people that want one, but they have already solved the problem of parsing the
current format. And without having writtens such a parser myself I think that
the current format is not inherently hard to parse.

What people really want is optional additional information in the human-
readable format. Giving them a machine readable format does not solve the
problem. Giving them a machine readable format with all-or-none of the
optional information and saying "figure it out yourself" does not solve
anything either. The same people who currently complain will continue to
complain.

#44

Magnus Hagander

magnus@hagander.net

over 16 years ago

In reply to: Peter Eisentraut (#43)

Re: generic options for explain

Peter Eisentraut wrote:

On Monday 25 May 2009 18:02:53 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I fail to
see what it buys us. I'd say that you're just reinforcing the point I
made upthread, which is that insisting that XML is the only way to get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

The impression I have is that (to misquote Churchill) XML is the worst
option available, except for all the others. We need something that can
represent a fairly complex data structure, easily supports addition or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't have to
write your own parser, in most languages". How many realistic
alternatives are there?

I think we are going in the wrong direction. No one has said that they want a
machine-readable EXPLAIN format.

That is not true. Tool developers like pgAdmin (I know that one for
sure), phpPgAdmin (I think they have said it too) and third party tools
have asked for this.

Right now we parse the EXPLAIN output. Which doesn't get easier with
each new thing we add to it :-)

It would be very nice to have it tool parseable.

I'm also fairly certain that people using auto_explain would have use
for a format that's easier to parse.

What people really want is optional additional information in the human-
readable format. Giving them a machine readable format does not solve the
problem. Giving them a machine readable format with all-or-none of the
optional information and saying "figure it out yourself" does not solve
anything either. The same people who currently complain will continue to
complain.

I agree that this is a separate issue. But that doesn't mean they don't
both exist.

//Magnus

#45

Dave Page

dpage@pgadmin.org

over 16 years ago

In reply to: Peter Eisentraut (#43)

Re: generic options for explain

On Tue, May 26, 2009 at 8:15 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

I think we are going in the wrong direction. No one has said that they want a
machine-readable EXPLAIN format. OK, there are historically about three
people that want one, but they have already solved the problem of parsing the
current format.

Pretty sure I've said I want one. And whilst it's true, we already
parse the current output in pgAdmin, it's a PITA whenever the format
changes. I also want a format in which Tom is not going to refuse to
include additional data (such as the schema a relation is in) because
it clutters the output. A machine readable format would seem to the
idea way to include all data we may need, without making
human-readable output an unreadable mess.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#46

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Peter Eisentraut (#43)

Re: generic options for explain

On May 26, 2009, at 8:15 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On Monday 25 May 2009 18:02:53 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I fail
to
see what it buys us. I'd say that you're just reinforcing the
point I
made upthread, which is that insisting that XML is the only way to
get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

The impression I have is that (to misquote Churchill) XML is the
worst
option available, except for all the others. We need something
that can
represent a fairly complex data structure, easily supports addition
or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't have
to
write your own parser, in most languages". How many realistic
alternatives are there?

I think we are going in the wrong direction. No one has said that
they want a
machine-readable EXPLAIN format. OK, there are historically about
three
people that want one, but they have already solved the problem of
parsing the
current format. And without having writtens such a parser myself I
think that
the current format is not inherently hard to parse.

What people really want is optional additional information in the
human-
readable format. Giving them a machine readable format does not
solve the
problem. Giving them a machine readable format with all-or-none of
the
optional information and saying "figure it out yourself" does not
solve
anything either. The same people who currently complain will
continue to
complain.

Peter,

The check is in the mail. :-)

In all seriousness, I have no problem at all with providing machine-
readable formats, but the problem you're describing here is definitely
my primary pain point.

...Robert

#47

Greg Stark

greg.stark@enterprisedb.com

over 16 years ago

In reply to: Peter Eisentraut (#43)

Re: generic options for explain

Well I want an SQL query-able format. I also want a way to retrieve
the data for a query run from within an application without disturbing
the application i.e. while still returning the regular result set.

But I also like being able to conveniently run explain and get the
results formatted to fit on the screen in a single step. I don't see
anything wrong with Robert's direction to pass options to explain. It
doesn't solve every problem but it doesn't make any of the other
things we need harder either.

On a bike-shedding note I would rather have the rhs of the option be
optional and default to true for boolean options.

Actually if we make a set of explain_* guc options we could make the
options just locally set those options.

--
Greg

On 26 May 2009, at 13:15, Peter Eisentraut <peter_e@gmx.net> wrote:

Show quoted text

On Monday 25 May 2009 18:02:53 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I fail
to
see what it buys us. I'd say that you're just reinforcing the
point I
made upthread, which is that insisting that XML is the only way to
get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

The impression I have is that (to misquote Churchill) XML is the
worst
option available, except for all the others. We need something
that can
represent a fairly complex data structure, easily supports addition
or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't have
to
write your own parser, in most languages". How many realistic
alternatives are there?

I think we are going in the wrong direction. No one has said that
they want a
machine-readable EXPLAIN format. OK, there are historically about
three
people that want one, but they have already solved the problem of
parsing the
current format. And without having writtens such a parser myself I
think that
the current format is not inherently hard to parse.

What people really want is optional additional information in the
human-
readable format. Giving them a machine readable format does not
solve the
problem. Giving them a machine readable format with all-or-none of
the
optional information and saying "figure it out yourself" does not
solve
anything either. The same people who currently complain will
continue to
complain.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Greg Stark (#47)

Re: generic options for explain

On May 26, 2009, at 8:46 AM, Greg Stark <greg.stark@enterprisedb.com>
wrote:

Well I want an SQL query-able format. I also want a way to retrieve
the data for a query run from within an application without
disturbing the application i.e. while still returning the regular
result set.

But I also like being able to conveniently run explain and get the
results formatted to fit on the screen in a single step. I don't see
anything wrong with Robert's direction to pass options to explain.
It doesn't solve every problem but it doesn't make any of the other
things we need harder either.

Your check is in the mail, too.

On a bike-shedding note I would rather have the rhs of the option be
optional and default to true for boolean options.

I was thinking about that, too, so +1.

Actually if we make a set of explain_* guc options we could make the
options just locally set those options.

I think that's probably over-complicated, but that's just MHO.

...Robert

Show quoted text

--
Greg

On 26 May 2009, at 13:15, Peter Eisentraut <peter_e@gmx.net> wrote:

On Monday 25 May 2009 18:02:53 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This is all much more complicated than what I proposed, and I
fail to
see what it buys us. I'd say that you're just reinforcing the
point I
made upthread, which is that insisting that XML is the only way
to get
more detailed information will just create a cottage industry of
beating that XML output format into submission.

The impression I have is that (to misquote Churchill) XML is the
worst
option available, except for all the others. We need something
that can
represent a fairly complex data structure, easily supports
addition or
removal of particular fields in the structure (including fields not
foreseen in the original design), is not hard for programs to parse,
and is widely supported --- ie, "not hard" includes "you don't
have to
write your own parser, in most languages". How many realistic
alternatives are there?

I think we are going in the wrong direction. No one has said that
they want a
machine-readable EXPLAIN format. OK, there are historically about
three
people that want one, but they have already solved the problem of
parsing the
current format. And without having writtens such a parser myself I
think that
the current format is not inherently hard to parse.

What people really want is optional additional information in the
human-
readable format. Giving them a machine readable format does not
solve the
problem. Giving them a machine readable format with all-or-none of
the
optional information and saying "figure it out yourself" does not
solve
anything either. The same people who currently complain will
continue to
complain.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Peter Eisentraut (#43)

Re: generic options for explain

Peter Eisentraut <peter_e@gmx.net> writes:

I think we are going in the wrong direction. No one has said that
they want a machine-readable EXPLAIN format. OK, there are
historically about three people that want one, but they have already
solved the problem of parsing the current format.

Well, obviously the set of tool designers is smaller than the set of
casual users of EXPLAIN, but their problems are none the less real and
very important.

What people really want is optional additional information in the human-
readable format. Giving them a machine readable format does not solve the
problem.

Actually, the exact problem is this: those two goals are in conflict.
There'd be little objection to adding any random set of optional stuff
to EXPLAIN's textual output, if it weren't for the fact that it would
make machine parsing of that output even harder than it is already.

So my feeling is that we need a machine-readable format containing all
the data in order to satisfy the needs of tool designers. Once they
are freed from having to parse EXPLAIN's textual output, we can whack
the textual output around all we want. (Which kills my previous
argument that we only need one new option, but such is life.)

Now there is a third set of desires having to do with being able to do
simple SQL-based analysis of EXPLAIN output. That's the piece I think
we don't have a good handle on. In particular, it's not clear whether
a SQL-friendly output format can be the same as either of the other
two. (I don't personally find this goal very compelling --- there is
no natural law saying that SQL is a good tool for analyzing EXPLAIN
output --- but I'm willing to look at it to see if it's feasible.)

regards, tom lane

#50

Andrew Dunstan

andrew@dunslane.net

over 16 years ago

In reply to: Tom Lane (#49)

Re: generic options for explain

Tom Lane wrote:

Now there is a third set of desires having to do with being able to do
simple SQL-based analysis of EXPLAIN output. That's the piece I think
we don't have a good handle on. In particular, it's not clear whether
a SQL-friendly output format can be the same as either of the other
two. (I don't personally find this goal very compelling --- there is
no natural law saying that SQL is a good tool for analyzing EXPLAIN
output --- but I'm willing to look at it to see if it's feasible.)

In libxml-enabled builds at least, this could presumably be done fairly
easily via the XML functions, especially if we get XSLT processing into
the core XML functionality as I hope we can do this release. In fact,
the ability to leverage existing XML functionality to munge the output
is the thing that swings me in favor of XML as the machine readable
output format instead of JSON, since we don't have and aren't terribly
likely to get an inbuilt JSON parser. It means we wouldn't need some
external tool at all.

cheers

andrew

#51

Dave Page

dpage@pgadmin.org

over 16 years ago

In reply to: Andrew Dunstan (#50)

Re: generic options for explain

On Tue, May 26, 2009 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

In libxml-enabled builds at least, this could presumably be done fairly
easily via the XML functions, especially if we get XSLT processing into the
core XML functionality as I hope we can do this release. In fact, the
ability to leverage existing XML functionality to munge the output is the
thing that swings me in favor of XML as the machine readable output format
instead of JSON, since we don't have and aren't terribly likely to get an
inbuilt JSON parser. It means we wouldn't need some external tool at all.

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#52

Magnus Hagander

magnus@hagander.net

over 16 years ago

In reply to: Dave Page (#51)

Re: generic options for explain

Dave Page wrote:

On Tue, May 26, 2009 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

In libxml-enabled builds at least, this could presumably be done fairly
easily via the XML functions, especially if we get XSLT processing into the
core XML functionality as I hope we can do this release. In fact, the
ability to leverage existing XML functionality to munge the output is the
thing that swings me in favor of XML as the machine readable output format
instead of JSON, since we don't have and aren't terribly likely to get an
inbuilt JSON parser. It means we wouldn't need some external tool at all.

Actually, I think a number of users would be *very* happy if we had a
builtin JSON parser. I'm unsure on how feasible that is though.

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

Yeah, but probably not a huge one. There is one for wx, but I don't
think it's included by default.

--
Magnus Hagander
Self: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#53

Peter Eisentraut

peter_e@gmx.net

over 16 years ago

In reply to: Dave Page (#51)

Re: generic options for explain

On Tuesday 26 May 2009 16:55:55 Dave Page wrote:

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

I was actually looking for a C library for JSON (json type for PostgreSQL; you
know it is coming :-) ), but only found a library tied to glib, which,
considering the experience with libxml, did not excite me. If someone knows
of a different, small, and independent JSON library for C, I would like to
hear about it.

#54

Magnus Hagander

magnus@hagander.net

over 16 years ago

In reply to: Peter Eisentraut (#53)

Re: generic options for explain

Peter Eisentraut wrote:

On Tuesday 26 May 2009 16:55:55 Dave Page wrote:

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

I was actually looking for a C library for JSON (json type for PostgreSQL; you
know it is coming :-) ), but only found a library tied to glib, which,
considering the experience with libxml, did not excite me. If someone knows
of a different, small, and independent JSON library for C, I would like to
hear about it.

The JSon page (http://json.org/) lists for example
http://fara.cs.uni-potsdam.de/~jsg/json_parser/ which appears to not
need it. But it seems very simple - though I haven't actually looked
into the details.

--
Magnus Hagander
Self: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#55

Andrew Dunstan

andrew@dunslane.net

over 16 years ago

In reply to: Peter Eisentraut (#53)

Re: generic options for explain

Peter Eisentraut wrote:

On Tuesday 26 May 2009 16:55:55 Dave Page wrote:

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

I was actually looking for a C library for JSON (json type for PostgreSQL; you
know it is coming :-) ), but only found a library tied to glib, which,
considering the experience with libxml, did not excite me. If someone knows
of a different, small, and independent JSON library for C, I would like to
hear about it.

There are several listed at <http://www.json.org/>

cheers

andrew

#56

Noname

tomas@tuxteam.de

over 16 years ago

In reply to: Magnus Hagander (#52)

Re: generic options for explain

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, May 26, 2009 at 04:36:56PM +0200, Magnus Hagander wrote:

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

Yeah, but probably not a huge one. There is one for wx, but I don't
think it's included by default.

...and to put things into perspective:

tomas@floh:~$ apt-cache show libxml2 libjson-glib-1.0-0 | grep "^Size"
Size: 814356
Size: 33538

(not that I would recommend this one, since that's the one tied to glib,
but seems that XML parsing is nearly one and a half orders of magnitude
more complex than JSON).

- -- tomás
who thinks that XML-as-a-data-description-language is a denial of
service attack on us all
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKHATbBcgs9XrR2kYRArJmAJ4wJlvbnuWKYTvIDrSoLJccCyMTLwCbBM39
NCVSrDaEVad3NfogJrwRtiY=
=Volp
-----END PGP SIGNATURE-----

#57

Aidan Van Dyk

aidan@highrise.ca

over 16 years ago

In reply to: Noname (#56)

Re: generic options for explain

* tomas@tuxteam.de <tomas@tuxteam.de> [090526 11:03]:

...and to put things into perspective:

tomas@floh:~$ apt-cache show libxml2 libjson-glib-1.0-0 | grep "^Size"
Size: 814356
Size: 33538

And including glib, which does all the work for libjson-glib:

mountie@pumpkin:~/projects/postgresql/PostgreSQL$ apt-cache show libxml2
libjson-glib-1.0-0 libglib2.0-0 | grep ^Size
Size: 870188
Size: 36132
Size: 845166

glib also pulls in libpcre:
Size: 214650

So:
- XML: 870188(libxml) + 76038 (zlib1g) = 946226
- JSON: 36132 (json) + 845166 (glib) + 214650 (pcre) = 1095948

;-)

--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

#58

Dimitri Fontaine

dfontaine@hi-media.com

over 16 years ago

In reply to: Peter Eisentraut (#53)

Re: generic options for explain

Hi,

Peter Eisentraut <peter_e@gmx.net> writes:

I was actually looking for a C library for JSON (json type for PostgreSQL; you
know it is coming :-) ), but only found a library tied to glib, which,
considering the experience with libxml, did not excite me. If someone knows
of a different, small, and independent JSON library for C, I would like to
hear about it.

Looking at http://json.org/, it seems this particular project could fit:

http://lloyd.github.com/yajl/

Yet Another JSON Library. YAJL is a small event-driven (SAX-style)
JSON parser written in ANSI C, and a small validating JSON
generator. YAJL is released under the BSD license.

...

It's all ANSI C. It's been successfully compiled on debian linux, OSX
10.4 i386 & ppc, OSX 10.5 i386, winXP, FreeBSD 4.10, FreeBSD 6.1
amd64, FreeBSD 7 i386, and windows vista. More platforms and binaries
as time permits.

...

A second motivation for writing YAJL, was that many available free
JSON parsers fall over on large or complex inputs. YAJL is careful to
minimize memory copying and input re-scanning when possible. The
result is a parser that should be fast enough for most applications or
tunable for any application. On my mac pro (2.66 ghz) it takes 1s to
verify a 60meg json file. Minimizing that same file with json_reformat
takes 4s.

Largely because YAJL deals with streams, it's possible to parse JSON
in low memory environments. Oftentimes with other parsers an
application must hold both the input text and the memory
representation of the tree in memory at one time. With YAJL you can
incrementally read the input stream and hold only the in memory
representation. Or for filtering or validation tasks, it's not
required to hold the entire input text in memory.

Hope this helps, regards,
--
dim

#59

Noname

tomas@tuxteam.de

over 16 years ago

In reply to: Aidan Van Dyk (#57)

Re: generic options for explain

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, May 26, 2009 at 11:15:21AM -0400, Aidan Van Dyk wrote:

* tomas@tuxteam.de <tomas@tuxteam.de> [090526 11:03]:

...and to put things into perspective:

tomas@floh:~$ apt-cache show libxml2 libjson-glib-1.0-0 | grep "^Size"
Size: 814356
Size: 33538

And including glib, which does all the work for libjson-glib:

mountie@pumpkin:~/projects/postgresql/PostgreSQL$ apt-cache show libxml2
libjson-glib-1.0-0 libglib2.0-0 | grep ^Size
Size: 870188
Size: 36132
Size: 845166

glib also pulls in libpcre:
Size: 214650

So:
- XML: 870188(libxml) + 76038 (zlib1g) = 946226
- JSON: 36132 (json) + 845166 (glib) + 214650 (pcre) = 1095948

;-)

OK, OK, you win (darn: should have known those bloatophile gnomies.
Surprise that they don't pull in Mono :-(

But json-c (just downloaded & compiled) is more in the ballpark of 100K,
if I count all produced *.o

And it's BSD.

Regards
- -- tomás

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKHAvaBcgs9XrR2kYRAkasAJwPzzw3Os8e7QA2HvMSkQ0iRGWz+ACfYlp+
Y/v3EO+8sRiPzJNumADatdM=
=EjCU
-----END PGP SIGNATURE-----

#60

Joshua Tolley

eggyknap@gmail.com

over 16 years ago

In reply to: Dave Page (#51)

Re: generic options for explain

On Tue, May 26, 2009 at 09:55:55AM -0400, Dave Page wrote:

from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

...and using XML introduces a dependency for those that apps that don't already
use some XML parser. I realize that since the pool of apps that care to
mechanically parse EXPLAIN output is small, it wouldn't necessarily be a big
deal to hand each of them a new dependency in the form of a parser for XML,
JSON, etc. But we know the least common denominator is to return a set of
tuples; let's make sure that really is unworkable before forcing even that
dependency.

- Josh / eggyknap

#61

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Magnus Hagander (#52)

Re: generic options for explain

On Tue, May 26, 2009 at 10:36 AM, Magnus Hagander <magnus@hagander.net> wrote:

Dave Page wrote:

On Tue, May 26, 2009 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

In libxml-enabled builds at least, this could presumably be done fairly
easily via the XML functions, especially if we get XSLT processing into the
core XML functionality as I hope we can do this release. In fact, the
ability to leverage existing XML functionality to munge the output is the
thing that swings me in favor of XML as the machine readable output format
instead of JSON, since we don't have and aren't terribly likely to get an
inbuilt JSON parser. It means we wouldn't need some external tool at all.

Actually, I think a number of users would be *very* happy if we had a
builtin JSON parser. I'm unsure on how feasible that is though.

I think it's likely that with proper design the amount of extra code
that is required to support both XML and JSON is likely to be very
small. I don't think we're going to get away without supporting XML
because there are so many people already using XML-based tools, and I
find Andrew's argument that we already have some built-in XML support
that could possibly be used to smooth the road here as well pretty
compelling.

On the other hand, XML can be a really difficult technology to work
with because it doesn't map cleanly to the data structures that most
modern scripting languages (Perl, Python, Ruby, and probably Java and
others) use. As a simple example, if you have a hash like { a => 1, b
=> 2 } (using the Perl syntax) you can map it to
<hash><a>1</a>2</hash>. That's easy to generate, but the
reverse transformation is full of error-handling cases, like
<hash><a>1</a>2<c/></hash> and <hash><a>1</a><a>2</a></hash>.
I'm sure experienced XML hackers have ways to work around these
problems, but the XML libraries I've worked with basically don't even
try to turn the thing into any sort of general-purpose data structure.
They just let you ask questions like "What is the root element? OK,
now what elements does it contain? OK, there's an <a> tag there, what
does that have inside it? Any more-deeply-nested tags?". On the
other hand, JSON is explicitly designed to serialize and deserialize
data structures of this type, and it pretty much just works, even
between completely different programming languages.

So to summarize that - if we're only going to support one
machine-readable output format, it's probably got to be XML. But if
the additional effort to also support JSON is small, which I believe
to be the case, then I think it's worth doing because it's actually
better technology for this type of application. Maybe someone will
feel inspired to work up a contrib/json.

...Robert

#62

Andrew Dunstan

andrew@dunslane.net

over 16 years ago

In reply to: Robert Haas (#61)

Re: generic options for explain

Robert Haas wrote:

On the other hand, XML can be a really difficult technology to work
with because it doesn't map cleanly to the data structures that most
modern scripting languages (Perl, Python, Ruby, and probably Java and
others) use. As a simple example, if you have a hash like { a => 1, b
=> 2 } (using the Perl syntax) you can map it to
<hash><a>1</a>2</hash>. That's easy to generate, but the
reverse transformation is full of error-handling cases, like
<hash><a>1</a>2<c/></hash> and <hash><a>1</a><a>2</a></hash>.
I'm sure experienced XML hackers have ways to work around these
problems, but the XML libraries I've worked with basically don't even
try to turn the thing into any sort of general-purpose data structure.
They just let you ask questions like "What is the root element? OK,
now what elements does it contain? OK, there's an <a> tag there, what
does that have inside it? Any more-deeply-nested tags?". On the
other hand, JSON is explicitly designed to serialize and deserialize
data structures of this type, and it pretty much just works, even
between completely different programming languages.

Since we will be controlling the XML output, we can restrict it to a
form that is equivalent to what JSON and similar serialisation languages
use. We can even produce an XSD schema specifying what is allowed, if
anyone is so minded, and a validating parser could be told to validate
the XML against that schema. And XSLT processing is a very powerful
transformation tool. We could even provide a stylesheet that would turn
the XML into JSON. :-)

Anyway, I think we're getting closer to consensus here.

I think there's a good case for being able to stash the EXPLAIN output
in a table as XML - that way we could slice and dice it several ways
without having to rerun the EXPLAIN.

cheers

andrew

#63

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Andrew Dunstan (#62)

Re: generic options for explain

On Tue, May 26, 2009 at 1:48 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Robert Haas wrote:

On the other hand, XML can be a really difficult technology to work
with because it doesn't map cleanly to the data structures that most
modern scripting languages (Perl, Python, Ruby, and probably Java and
others) use. As a simple example, if you have a hash like { a => 1, b
=> 2 } (using the Perl syntax) you can map it to
<hash><a>1</a>2</hash>. That's easy to generate, but the
reverse transformation is full of error-handling cases, like
<hash><a>1</a>2<c/></hash> and <hash><a>1</a><a>2</a></hash>.
I'm sure experienced XML hackers have ways to work around these
problems, but the XML libraries I've worked with basically don't even
try to turn the thing into any sort of general-purpose data structure.
They just let you ask questions like "What is the root element? OK,
now what elements does it contain? OK, there's an <a> tag there, what
does that have inside it? Any more-deeply-nested tags?". On the
other hand, JSON is explicitly designed to serialize and deserialize
data structures of this type, and it pretty much just works, even
between completely different programming languages.

Since we will be controlling the XML output, we can restrict it to a form
that is equivalent to what JSON and similar serialisation languages use. We
can even produce an XSD schema specifying what is allowed, if anyone is so
minded, and a validating parser could be told to validate the XML against
that schema. And XSLT processing is a very powerful transformation tool. We
could even provide a stylesheet that would turn the XML into JSON. :-)

Yeah, that's fine. I think we should target 4/1/2010 as the
submission date for that stylesheet. :-)

Anyway, I think we're getting closer to consensus here.

I think there's a good case for being able to stash the EXPLAIN output in a
table as XML - that way we could slice and dice it several ways without
having to rerun the EXPLAIN.

Yes, I think there is an excellent case for being able to stash any
output format into a table.

...Robert

#64

Greg Stark

greg.stark@enterprisedb.com

over 16 years ago

In reply to: Robert Haas (#63)

Re: generic options for explain

(sorry for top posting - stupid apple)

So the real elephant in the room is that the existing explain code is
not really designed to be extensible, configurable, or to be printed
in different formats.

The current code is basically just gobs of text printed by different
routines all over the code base. There are no data structures which
represent what explain prints. The closest thing is the
instrumentation objects which obtain the timing and counts but not the
planner expectations or any associated data.

If we're going to support multiple output formats or options to turn
off and on sections I think we need to build a data structure
independent of the format, have code to include or exclude stats as
requested and then pass that to the requested formatter.

--
Greg

On 26 May 2009, at 18:53, Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

On Tue, May 26, 2009 at 1:48 PM, Andrew Dunstan
<andrew@dunslane.net> wrote:

Robert Haas wrote:

On the other hand, XML can be a really difficult technology to work
with because it doesn't map cleanly to the data structures that most
modern scripting languages (Perl, Python, Ruby, and probably Java
and
others) use. As a simple example, if you have a hash like { a =>
1, b
=> 2 } (using the Perl syntax) you can map it to
<hash><a>1</a>2</hash>. That's easy to generate, but the
reverse transformation is full of error-handling cases, like
<hash><a>1</a>2<c/></hash> and <hash><a>1</a><a>2</a></hash>.
I'm sure experienced XML hackers have ways to work around these
problems, but the XML libraries I've worked with basically don't
even
try to turn the thing into any sort of general-purpose data
structure.
They just let you ask questions like "What is the root element?
OK,
now what elements does it contain? OK, there's an <a> tag there,
what
does that have inside it? Any more-deeply-nested tags?". On the
other hand, JSON is explicitly designed to serialize and deserialize
data structures of this type, and it pretty much just works, even
between completely different programming languages.

Since we will be controlling the XML output, we can restrict it to
a form
that is equivalent to what JSON and similar serialisation languages
use. We
can even produce an XSD schema specifying what is allowed, if
anyone is so
minded, and a validating parser could be told to validate the XML
against
that schema. And XSLT processing is a very powerful transformation
tool. We
could even provide a stylesheet that would turn the XML into
JSON. :-)

Yeah, that's fine. I think we should target 4/1/2010 as the
submission date for that stylesheet. :-)

Anyway, I think we're getting closer to consensus here.

I think there's a good case for being able to stash the EXPLAIN
output in a
table as XML - that way we could slice and dice it several ways
without
having to rerun the EXPLAIN.

Yes, I think there is an excellent case for being able to stash any
output format into a table.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Greg Stark (#64)

Re: generic options for explain

Greg Stark <greg.stark@enterprisedb.com> writes:

So the real elephant in the room is that the existing explain code is
not really designed to be extensible, configurable, or to be printed
in different formats.

These are implementation details ;-). Let's get a definition that
everyone can sign off on, and then worry about what has to be done
to the code to make it happen. Even if we end up throwing away and
rewriting all of explain.c, that's not *that* much code.

regards, tom lane

#66

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Greg Stark (#64)

Re: generic options for explain

On Tue, May 26, 2009 at 3:04 PM, Greg Stark <greg.stark@enterprisedb.com> wrote:

(sorry for top posting - stupid apple)

So the real elephant in the room is that the existing explain code is not
really designed to be extensible, configurable, or to be printed in
different formats.

The current code is basically just gobs of text printed by different
routines all over the code base. There are no data structures which

All over the code base? It looks to me like most of it is in
explain.c, specifically explain_outNode().

(On an unrelated point, it's difficult to imagine why someone thought
that was a good way of capitalizing & punctuating that function name.)

represent what explain prints. The closest thing is the instrumentation
objects which obtain the timing and counts but not the planner expectations
or any associated data.

If we're going to support multiple output formats or options to turn off and
on sections I think we need to build a data structure independent of the
format, have code to include or exclude stats as requested and then pass
that to the requested formatter.

That sounds about right to me. I think that representation can be
pretty thin, though, maybe just a big struct with all the attributes
that are applicable to any node type and pointers to its left and
right children.

...Robert

#67

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#65)

Re: generic options for explain

On Tue, May 26, 2009 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <greg.stark@enterprisedb.com> writes:

So the real elephant in the room is that the existing explain code is
not really designed to be extensible, configurable, or to be printed
in different formats.

These are implementation details ;-). Let's get a definition that
everyone can sign off on, and then worry about what has to be done
to the code to make it happen. Even if we end up throwing away and
rewriting all of explain.c, that's not *that* much code.

I'm actually not sure there's a whole lot to hash out... I was going
to take a crack at writing some code.

...Robert

#68

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Robert Haas (#67)

Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 26, 2009 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

These are implementation details ;-). �Let's get a definition that
everyone can sign off on, and then worry about what has to be done
to the code to make it happen.

I'm actually not sure there's a whole lot to hash out... I was going
to take a crack at writing some code.

I still haven't seen anything but formless handwaving as far as the "SQL
table" output format goes. For that matter, there's not much more than
handwaving behind the "XML" meme either. Show us a spec for the output
format, then think about code. (This was somewhere around slide ten
here: http://momjian.us/main/writings/pgsql/patch.pdf ;-))

regards, tom lane

#69

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#68)

Re: generic options for explain

On Tue, May 26, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 26, 2009 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

These are implementation details ;-). Let's get a definition that
everyone can sign off on, and then worry about what has to be done
to the code to make it happen.

I'm actually not sure there's a whole lot to hash out... I was going
to take a crack at writing some code.

I still haven't seen anything but formless handwaving as far as the "SQL
table" output format goes. For that matter, there's not much more than
handwaving behind the "XML" meme either. Show us a spec for the output
format, then think about code. (This was somewhere around slide ten
here: http://momjian.us/main/writings/pgsql/patch.pdf ;-))

OK, how about this:

http://archives.postgresql.org/message-id/603c8f070905241827g74c8bf9cie9d98e38037a8356@mail.gmail.com

I note in passing that there have been 51 messages posted to this
thread since I wrote that email, and none of the were responses to it.

At any rate, that email might not be as detailed as what you're
looking for, but it's certainly a start. I don't really know how the
table-format output is going to work out; I have to look at the code
more to get a feeling for that. But I think with respect to XML or
JSON, there really aren't too many options for how it can look, modulo
minor syntax tweaks like arguing about whether the join type should be
labelled "JoinType" or "jointype" or "join_type". Still, if you have
comments or think I'm overlooking something important, I definitely
would like to know about that now before I put more time into it.

I recognize that we haven't come to a consensus on the best possible
syntax for EXPLAIN options, but it seems to me that threshold issue
for improving EXPLAIN is everyone agreeing that we're going to allow
for some kind of extendable syntax that doesn't rely on all options
being keywords (presented in a fixed order, no less!). You caved in
on that point upthread and I don't think we have any other holdouts.
Now, of course, my syntax is the best possible one in the entire
universe, but if by chance there is a technically feasible alternative
syntax on which more than one person can agree (note: this has not
happened yet), adjusting my patch to use that syntax rather than the
one I stole from Peter shouldn't be too hard.

A second issue on which we don't have consensus is a method to capture
explain output. I am 100% of the opinion that there are only two
sensible things to do here: (1) make EXPLAIN a fully reserved keyword
so that we can use it just like a SELECT, or (2) provide a built-in
function like pg_explain() that calls EXPLAIN with a user-specified
set of arguments, and which third-party tools can count on to be
installed. Since you labelled (1) as a non-starter and AFAICS you're
the only holdout on making (2) a built-in rather than something
everyone has to define for themselves, I'm hopeful that we'll bring
you around. :-)

The final issue on which we don't have a clear consensus is what OTHER
new options we want for EXPLAIN aside from choice of output format. I
posted a few ideas that I have and solicited some others upthread, but
I think that the volume of email on other aspects of this patch has
deprived people of the necessary time and space to think about how
they might like to use an extensible options syntax once we have it -
not to mention that the original patch was only posted 3 days ago and
on a day when many of us were on airplanes, about to get on airplanes,
or still jet-lagged. Personally, I think that that's the most
interesting aspect of this whole project so I hope it gets some
attention going forward, but I'm not too concerned about the exact
timing of that attention. The point is that people not-infrequently
come up with more stuff they'd like to see in EXPLAIN output, and
those ideas get shot down because we don't have the syntax. If we fix
the syntax, those ideas will come back around again in due course, and
we'll be able to consider them on their merits rather than
peremptorily shooting them down.

...Robert

#70

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Robert Haas (#69)

Re: generic options for explain

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 26, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I still haven't seen anything but formless handwaving as far as the "SQL
table" output format goes. �For that matter, there's not much more than
handwaving behind the "XML" meme either.

OK, how about this:
http://archives.postgresql.org/message-id/603c8f070905241827g74c8bf9cie9d98e38037a8356@mail.gmail.com

I note in passing that there have been 51 messages posted to this
thread since I wrote that email, and none of the were responses to it.

Well, we were having too much fun arguing about trivia ;-). And I
suspect a number of people were too jet-lagged to keep track of what
they'd read and what not. Anyway, good, we have a starting point.

Some issues that I see here:

1. You seem to be assuming that each table row will represent exactly
one plan node, no more, no less. That's okay as a first approximation
but it breaks down on closer examination. In particular, where will you
hang the information that's already available about trigger execution
costs? Those are not associated with any particular plan node, as they
occur atop the whole plan. The same goes for the total execution time
of course, and I can foresee other types of stats that we might gather
someday that would be hard to tie to any specific plan node.

In XML this is soluble by having a toplevel node <ExplainResults> that
contains not only the plan tree but other children. I'm not seeing how
to translate that into a SQL table, though. Or at least not just one
SQL table.

2. You didn't say anything about how any but simple scalar fields will
be represented. Filter conditions and sort keys are particularly
interesting here. I'm not really happy with just plopping down the same
textual output we have now --- that is just as human-friendly-and-not-
machine-friendly as before, only with a slightly smaller scope. I can
foresee for example that someone might wish to extract the second or
third sort key expression from a Sort node's sort key list. Or what
about problems such as "find which nodes this field is used in"?

3. You left us with a handwave about how the tree structure will be
represented in a table. Needs to be explicit. And it's not just
simple child relationships that should be represented ... tell us
about initplans and subplans, too.

4. The point about having lots of NULL columns is an annoyance that
could escalate to the point of near unusability. To get a feeling for
how workable that is, we need a pretty exact list of the set of output
columns, not just a rough list of the kinds of things that will be
there.

regards, tom lane

#71

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Tom Lane (#70)

Re: generic options for explain

On Tue, May 26, 2009 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 26, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I still haven't seen anything but formless handwaving as far as the "SQL
table" output format goes. For that matter, there's not much more than
handwaving behind the "XML" meme either.

OK, how about this:
http://archives.postgresql.org/message-id/603c8f070905241827g74c8bf9cie9d98e38037a8356@mail.gmail.com

I note in passing that there have been 51 messages posted to this
thread since I wrote that email, and none of the were responses to it.

Well, we were having too much fun arguing about trivia ;-). And I
suspect a number of people were too jet-lagged to keep track of what
they'd read and what not. Anyway, good, we have a starting point.

Some issues that I see here:

1. You seem to be assuming that each table row will represent exactly
one plan node, no more, no less. That's okay as a first approximation
but it breaks down on closer examination. In particular, where will you
hang the information that's already available about trigger execution
costs? Those are not associated with any particular plan node, as they
occur atop the whole plan. The same goes for the total execution time
of course, and I can foresee other types of stats that we might gather
someday that would be hard to tie to any specific plan node.

In XML this is soluble by having a toplevel node <ExplainResults> that
contains not only the plan tree but other children. I'm not seeing how
to translate that into a SQL table, though. Or at least not just one
SQL table.

2. You didn't say anything about how any but simple scalar fields will
be represented. Filter conditions and sort keys are particularly
interesting here. I'm not really happy with just plopping down the same
textual output we have now --- that is just as human-friendly-and-not-
machine-friendly as before, only with a slightly smaller scope. I can
foresee for example that someone might wish to extract the second or
third sort key expression from a Sort node's sort key list. Or what
about problems such as "find which nodes this field is used in"?

3. You left us with a handwave about how the tree structure will be
represented in a table. Needs to be explicit. And it's not just
simple child relationships that should be represented ... tell us
about initplans and subplans, too.

4. The point about having lots of NULL columns is an annoyance that
could escalate to the point of near unusability. To get a feeling for
how workable that is, we need a pretty exact list of the set of output
columns, not just a rough list of the kinds of things that will be
there.

Responding to these in bulk, I think that 1, 3, and 4 are pretty
convincing arguments that the SQL-based output format is
underspecified. I hereby promise not to do anything about that
without further discussion, which is an easy promise to make
considering that in light of those comments I have no idea what it
should look like. I think (1) is the most damning point. However, as
far as I can see, none of these will affect XML or JSON.

With respect to (2), I think we should output the same text format
that we have now, for starters. I agree that's not the only thing
that someone might want, but I think there's a pretty good argument
that it's ONE thing that someone might reasonably want, depending on
the application. If someone cares to build a better mousetrap in this
area, it can be added on once we figure out the design, and without
breaking anything! - that's sort of the whole point of this exercise.

...Robert

#72

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 16 years ago

In reply to: Robert Haas (#71)

Re: generic options for explain

Sorry to come in on this discussion so late. Just catching up....

Robert Haas <robertmhaas@gmail.com> wrote:

Responding to these in bulk, I think that 1, 3, and 4 are pretty
convincing arguments that the SQL-based output format is
underspecified. I hereby promise not to do anything about that
without further discussion, which is an easy promise to make
considering that in light of those comments I have no idea what it
should look like. I think (1) is the most damning point. However,
as far as I can see, none of these will affect XML or JSON.

Personally, I find XML to be very hard to read; however, I can see the
value of writing to that and having someone who can tolerate XSLT turn
XML into anything else we want. (That could include morphing it into
SELECT statements with the literals to present it as a tuple set, I
should think.) As long as nobody considers this issue "done" until
there are useful and convenient ways to display and use the data
within psql without having to look at the XML, that seems a reasonable
approach.

The big plus of the current technique is that it is so convenient to
Ctrl+C something which is running too long, arrow up, hit Home, and
put the EXPLAIN word in front. Turning the query into a character
string literal and feeding it to a function would be a big step
backward.

A big down side of the current technique is that you can't get both
the results of a SELECT and its plan. I haven't seen any discussion
here about emitting the EXPLAIN output through some INFO messages or
some such, and letting the query return its normal results, but I feel
that would be a significant improvement, if it that be done.

Also, something I miss from previous database products is a way to
control the verbosity of the output when planning. I do think that
needs to be some sort of option up front, not a filter phase, because
of the high cost it can have. If there was a way to show all the
candidate plans and their cost estimates in a run time environment,
without any special build or configuration needed, I'd use it every
now and then.

-Kevin

#73

ioguix

ioguix@free.fr

over 16 years ago

In reply to: Magnus Hagander (#52)

Re: generic options for explain

Magnus Hagander a ï¿½crit :

Dave Page wrote:

I was thinking something similar, but from the pgAdmin perspective. We
already use libxml2, but JSON would introduce another dependency for
us.

Yeah, but probably not a huge one. There is one for wx, but I don't
think it's included by default.

+1 for the machine readable explain.

FWIW, I have an early patch for phpPgAdmin about a graphical explain.
IIRC when I wrote it, I told myself the parser might actually be broken
with multi-level sub-queries or something. But I ended with the same
parsing code than pgAdmin anyway.

About the format, JSON would be the best here, as it is a one function
call in PHP to retrieve an associative array from the JSON code.

--
Guillaume (ioguix) de Rorthais

#74

Greg Stark

stark@enterprisedb.com

over 16 years ago

In reply to: Kevin Grittner (#72)

Re: generic options for explain

On Wed, May 27, 2009 at 1:30 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

The big plus of the current technique is that it is so convenient to
Ctrl+C something which is running too long, arrow up, hit Home, and
put the EXPLAIN word in front. Turning the query into a character
string literal and feeding it to a function would be a big step
backward.

A big down side of the current technique is that you can't get both
the results of a SELECT and its plan. I haven't seen any discussion
here about emitting the EXPLAIN output through some INFO messages or
some such, and letting the query return its normal results, but I feel
that would be a significant improvement, if it that be done.

Would something like this address both of your issues?

http://article.gmane.org/gmane.comp.db.postgresql.devel.patches/21614/match=siginfo

It let you hit a control character while the query was running to view
the explain analyze for the results so far. The query kept running and
you could request further updates whenever you wanted.

--
greg

#75

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 16 years ago

In reply to: Greg Stark (#74)

Re: generic options for explain

Greg Stark <stark@enterprisedb.com> wrote:

http://article.gmane.org/gmane.comp.db.postgresql.devel.patches/21614/match=siginfo