machine-readable explain output v4

Started by Robert Haasover 16 years ago76 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

OK, here's the updated version of my machine-readable explain output
patch. This needed heavy updating as a result of the changes that Tom
asked me to make to the explain options patch, and the further changes
he made himself. In addition to any regressions I may have introduced
during the rebasing process, there is one definite problem here: in
the previous version of this patch, explain (format xml) returned XML
data; now, it's back to text.

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery. Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

...Robert

Attachments:

explain_format-v4.patchtext/x-diff; charset=US-ASCII; name=explain_format-v4.patchDownload+1338-1037
#2Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: machine-readable explain output v4

Hi Robert, Hi all,

On Thursday 30 July 2009 05:05:48 Robert Haas wrote:

OK, here's the updated version of my machine-readable explain output
patch. This needed heavy updating as a result of the changes that Tom
asked me to make to the explain options patch, and the further changes
he made himself. In addition to any regressions I may have introduced
during the rebasing process, there is one definite problem here: in
the previous version of this patch, explain (format xml) returned XML
data; now, it's back to text.

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery. Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

Hm. I think its cleaner to move the option parsing into its own function - its
3 places parsing options then and probably not going to get less.
I am not sure this is cleaner than including the parsed options in ExplainStm
though... (possibly in a separate struct to avoid changing copy/equal-funcs
everytime)

Some more comments on the (new) version of the patch:
- The regression tests are gone?
- Currently a value scan looks like »Values Scan on "*VALUES*"« What about
adding its alias at least in verbose mode? This currently is inconsistent with
other scans. Also he output columns of a VALUES scan are named column$N even
if names as specified like in AS foo(colname)
- why do xml/json contain no time units anymore? (e.g. Total Runtime).
Admittedly thats already inconsistent in the current text format...

- Code patterns like:
if (es->format == EXPLAIN_FORMAT_TEXT)
appendStringInfo(es->str, "Total runtime: %.3f ms\n",
1000.0 * totaltime);
else if (es->format == EXPLAIN_FORMAT_XML)
appendStringInfo(es->str,
" <Total-Runtime>%.3f</Total-Runtime>\n",
1000.0 * totaltime);
else if (es->format == EXPLAIN_FORMAT_JSON)
appendStringInfo(es->str, ",\n \"Total runtime\" : %.3f",
1000.0 * totaltime);
or
if (es->format == EXPLAIN_FORMAT_TEXT)
appendStringInfo(es->str, " for constraint %s", conname);
else if (es->format == EXPLAIN_FORMAT_XML)
{
appendStringInfoString(es->str, " <Constraint-Name>");
escape_xml(es->str, conname);
appendStringInfoString(es->str, "</Constraint-Name>\n");
}
else if (es->format == EXPLAIN_FORMAT_JSON)
{
appendStringInfo(es->str, "\n \"Constraint Name\": ");
escape_json(es->str, conname);
}

possibly could be simplified using ExplainPropertyText or a function accepting
a format string.
At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple places in
ExplainOnePlan and report_triggers.

On Friday 31 July 2009 23:13:54 Robert Haas wrote:

On Sat, Jul 18, 2009 at 10:29 PM, Andres Freund<andres@anarazel.de> wrote:

One part where I find the code flow ugly is 'did_boilerplate' in
report_triggers/its callsites.
I can see why it is done that way, but its not exactly obvious to read
when you want to find out how the format looks.

Suggestions?

The only idea without building more xml/json infrastructure I have is using a
separate stringbuffer inside report_triggers - but thats not much nicer.

Thats all I could find right now...

Andres

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#2)
Re: machine-readable explain output v4

On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres@anarazel.de> wrote:

Hi Robert, Hi all,

On Thursday 30 July 2009 05:05:48 Robert Haas wrote:

OK, here's the updated version of my machine-readable explain output
patch.  This needed heavy updating as a result of the changes that Tom
asked me to make to the explain options patch, and the further changes
he made himself.  In addition to any regressions I may have introduced
during the rebasing process, there is one definite problem here: in
the previous version of this patch, explain (format xml) returned XML
data; now, it's back to text.

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery.  Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

Hm. I think its cleaner to move the option parsing into its own function - its
3 places parsing options then and probably not going to get less.
I am not sure this is cleaner than including the parsed options in ExplainStm
though... (possibly in a separate struct to avoid changing copy/equal-funcs
everytime)

Well, this is why we need Tom to weigh in.

Some more comments on the (new) version of the patch:
- The regression tests are gone?

Tom added some that look adequate to me to create_index.sql, as a
separate commit, so I don't think I need to do this in my patch any
more. Maybe some of those examples should be changed to output JSON
or XML, though, but I'd rather leave this up to Tom's discretion on
commit because I think he has opinions about this and I think my
chances of guessing what they are are low.

- Currently a value scan looks like »Values Scan on "*VALUES*"« What about
adding its alias at least in verbose mode? This currently is inconsistent with
other scans.

I don't know why this doesn't work, but I think it's unrelated to this patch.

Also he output columns of a VALUES scan are named column$N even
if names as specified like in AS foo(colname)

This is consistent with how other types of scans are treated.

rhaas=# explain verbose select x,y,z from (select * from pg_class) a(x,y,z);
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class (cost=0.00..8.44 rows=244 width=72)
Output: pg_class.relname, pg_class.relnamespace, pg_class.reltype
(2 rows)

- why do xml/json contain no time units anymore? (e.g. Total Runtime).
Admittedly thats already inconsistent in the current text format...

I'm not sure what you mean by "any more". I don't think any version
of these patches I ever submitted did otherwise, nor do I think it's
particularly valuable. Costs are implicitly in units of
PostgreSQL-costing and times are implicitly in units of milliseconds,
just as they are in the text format. Changing this would require
clients to strip off the trailing 'ms' before converting to a
floating-point number, which seems like an irritation with no
corresponding benefit.

- Code patterns like:
               if (es->format == EXPLAIN_FORMAT_TEXT)
                       appendStringInfo(es->str, "Total runtime: %.3f ms\n",
                                                        1000.0 * totaltime);
               else if (es->format == EXPLAIN_FORMAT_XML)
                       appendStringInfo(es->str,
                                                        "    <Total-Runtime>%.3f</Total-Runtime>\n",
                                                        1000.0 * totaltime);
               else if (es->format == EXPLAIN_FORMAT_JSON)
                       appendStringInfo(es->str, ",\n    \"Total runtime\" : %.3f",
                                                        1000.0 * totaltime);
or
                       if (es->format == EXPLAIN_FORMAT_TEXT)
                               appendStringInfo(es->str, " for constraint %s", conname);
                       else if (es->format == EXPLAIN_FORMAT_XML)
                       {
                               appendStringInfoString(es->str, "        <Constraint-Name>");
                               escape_xml(es->str, conname);
                               appendStringInfoString(es->str, "</Constraint-Name>\n");
                       }
                       else if (es->format == EXPLAIN_FORMAT_JSON)
                       {
                               appendStringInfo(es->str, "\n        \"Constraint Name\": ");
                               escape_json(es->str, conname);
                       }

possibly could be simplified using ExplainPropertyText or a function accepting
a format string.
At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple places in
ExplainOnePlan and report_triggers.

Well, the whole explain output format is pretty idiosyncratic, and I
had to work pretty hard to beat it into submission. I think that it
would not be totally trivial to do what you're suggesting here because
it would require adding code to manage es->indent outside of
ExplainPrintPlan(), which we currently don't. I'm not sure whether
that works out to a net win.

...Robert

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: machine-readable explain output v4

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres@anarazel.de> wrote:

- Currently a value scan looks like �Values Scan on "*VALUES*"� What about
adding its alias at least in verbose mode? This currently is inconsistent with
other scans.

I don't know why this doesn't work, but I think it's unrelated to this patch.

If you mean something like

regression=# explain select * from (values(1)) ss;
QUERY PLAN
-------------------------------------------------------------
Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
(1 row)

I think the reason is that the alias applies to a SubqueryScan node that
later gets optimized away. The VALUES per se doesn't have an alias.

regards, tom lane

#5Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#3)
Re: machine-readable explain output v4

On Sunday 02 August 2009 23:34:04 Robert Haas wrote:

On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres@anarazel.de> wrote:

Hi Robert, Hi all,

On Thursday 30 July 2009 05:05:48 Robert Haas wrote:

OK, here's the updated version of my machine-readable explain output
patch. This needed heavy updating as a result of the changes that Tom
asked me to make to the explain options patch, and the further changes
he made himself. In addition to any regressions I may have introduced
during the rebasing process, there is one definite problem here: in
the previous version of this patch, explain (format xml) returned XML
data; now, it's back to text.

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery. Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

Hm. I think its cleaner to move the option parsing into its own function
- its 3 places parsing options then and probably not going to get less. I
am not sure this is cleaner than including the parsed options in
ExplainStm though... (possibly in a separate struct to avoid changing
copy/equal-funcs everytime)

Well, this is why we need Tom to weigh in.

Yes.

Some more comments on the (new) version of the patch:
- The regression tests are gone?

Tom added some that look adequate to me to create_index.sql, as a
separate commit, so I don't think I need to do this in my patch any
more. Maybe some of those examples should be changed to output JSON
or XML, though, but I'd rather leave this up to Tom's discretion on
commit because I think he has opinions about this and I think my
chances of guessing what they are are low.

Yea, I was referring to ones using xml/json.

- Currently a value scan looks like »Values Scan on "*VALUES*"« What
about adding its alias at least in verbose mode? This currently is
inconsistent with other scans.

I don't know why this doesn't work, but I think it's unrelated to this
patch.

True.

Also he output columns of a VALUES scan are named column$N even
if names as specified like in AS foo(colname)

This is consistent with how other types of scans are treated.
rhaas=# explain verbose select x,y,z from (select * from pg_class)
a(x,y,z); QUERY PLAN
----------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class (cost=0.00..8.44 rows=244 width=72)
Output: pg_class.relname, pg_class.relnamespace, pg_class.reltype
(2 rows)

This is someone weird considering since using it directly in relations works
different:
explain (verbose) SELECT * FROM pg_namespace AS f(a,b,c);
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on pg_catalog.pg_namespace f (cost=0.00..1.06 rows=6 width=100)
Output: a, b, c
(2 rows)

Not your "guilt" though. Still its rather strange and looks worth fixable.

- why do xml/json contain no time units anymore? (e.g. Total Runtime).
Admittedly thats already inconsistent in the current text format...

I'm not sure what you mean by "any more". I don't think any version
of these patches I ever submitted did otherwise, nor do I think it's
particularly valuable. Costs are implicitly in units of
PostgreSQL-costing and times are implicitly in units of milliseconds,
just as they are in the text format. Changing this would require
clients to strip off the trailing 'ms' before converting to a
floating-point number, which seems like an irritation with no
corresponding benefit.

I did not think any of your patches did - it was just a difference between the
original output and the new formats I just noted - as I said its not even
consistent in the text format.

- Code patterns like:
if (es->format == EXPLAIN_FORMAT_TEXT)
appendStringInfo(es->str, "Total runtime: %.3f
ms\n", 1000.0 * totaltime); else if (es->format == EXPLAIN_FORMAT_XML)
appendStringInfo(es->str,
"
<Total-Runtime>%.3f</Total-Runtime>\n", 1000.0 * totaltime); else if
(es->format == EXPLAIN_FORMAT_JSON)
appendStringInfo(es->str, ",\n \"Total
runtime\" : %.3f", 1000.0 * totaltime); or
if (es->format == EXPLAIN_FORMAT_TEXT)
appendStringInfo(es->str, " for constraint
%s", conname); else if (es->format == EXPLAIN_FORMAT_XML) {
appendStringInfoString(es->str, "
<Constraint-Name>"); escape_xml(es->str, conname);
appendStringInfoString(es->str,
"</Constraint-Name>\n"); }
else if (es->format == EXPLAIN_FORMAT_JSON)
{
appendStringInfo(es->str, "\n
\"Constraint Name\": "); escape_json(es->str, conname);
}

possibly could be simplified using ExplainPropertyText or a function
accepting a format string.
At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple
places in ExplainOnePlan and report_triggers.

Well, the whole explain output format is pretty idiosyncratic, and I
had to work pretty hard to beat it into submission. I think that it
would not be totally trivial to do what you're suggesting here because
it would require adding code to manage es->indent outside of
ExplainPrintPlan(), which we currently don't. I'm not sure whether
that works out to a net win.

Thats why I suggested doing it for JSON/XML only. E.g. like in the attached
patch. The result looks simpler for my eyes.

While re-checking this I noticed a newly introduced bug in report_triggers() -
in case of a trigger/conname==false "Trigger Name" gets printed twice due to a
duplicated else - merge glitch? (fixed in attached patch as well)

Attachments:

0001-reduce-json-xml-duplication-fix-duplicate-output.patchtext/x-patch; charset=UTF-8; name=0001-reduce-json-xml-duplication-fix-duplicate-output.patchDownload+74-87
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: machine-readable explain output v4

Robert Haas <robertmhaas@gmail.com> writes:

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery. �Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

I don't see anything particularly wrong with having ExplainResultDesc
do the same kind of thing GetCommandLogLevel is doing.

The alternative is to have it call a function that extracts all the
parameters into an ExplainState, but the thing I have against that is
that it makes for a larger cross-section of user mistakes that will
result in throwing an elog() before we actually reach execution.
I didn't want GetCommandLogLevel throwing any errors it doesn't actually
have to, because that would interfere with logging of statements that
could perfectly well get logged normally. I'm not sure if there are any
comparable reasons to not want errors from UtilityTupleDescriptor, but
there could well be.

- The regression tests are gone?

Tom added some that look adequate to me to create_index.sql, as a
separate commit, so I don't think I need to do this in my patch any
more. Maybe some of those examples should be changed to output JSON
or XML, though, but I'd rather leave this up to Tom's discretion on
commit because I think he has opinions about this and I think my
chances of guessing what they are are low.

Well, of course the existing tests are not going to exercise XML or
JSON output format. Dunno how much we care. I had supposed that
XML or JSON would always emit all the fields and leave it to the
recipient to suppress what they don't want. If we want to have
platform-independent regression tests then we'd need to make the
COSTS option effective for XML/JSON format --- do we want that?

I'm not sure what you mean by "any more". I don't think any version
of these patches I ever submitted did otherwise, nor do I think it's
particularly valuable. Costs are implicitly in units of
PostgreSQL-costing and times are implicitly in units of milliseconds,
just as they are in the text format. Changing this would require
clients to strip off the trailing 'ms' before converting to a
floating-point number, which seems like an irritation with no
corresponding benefit.

I agree with not labeling these numbers. We definitely can't label
the costs with anything useful, and labeling runtimes would be a
bit inconsistent.

regards, tom lane

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: machine-readable explain output v4

On Monday 03 August 2009 01:57:48 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

- The regression tests are gone?

Tom added some that look adequate to me to create_index.sql, as a
separate commit, so I don't think I need to do this in my patch any
more. Maybe some of those examples should be changed to output JSON
or XML, though, but I'd rather leave this up to Tom's discretion on
commit because I think he has opinions about this and I think my
chances of guessing what they are are low.

Well, of course the existing tests are not going to exercise XML or
JSON output format. Dunno how much we care. I had supposed that
XML or JSON would always emit all the fields and leave it to the
recipient to suppress what they don't want. If we want to have
platform-independent regression tests then we'd need to make the
COSTS option effective for XML/JSON format --- do we want that?

Options such as COSTS do effect XML/JSON right now. While not important for
COSTS itself, I think its generally good to do so because a certain option
might not be done per default efficiencywise and I don't see a reason to
specialcase COSTS.

Andres

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: machine-readable explain output v4

On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Well, of course the existing tests are not going to exercise XML or
JSON output format.  Dunno how much we care.  I had supposed that
XML or JSON would always emit all the fields and leave it to the
recipient to suppress what they don't want.  If we want to have
platform-independent regression tests then we'd need to make the
COSTS option effective for XML/JSON format --- do we want that?

Well, as I've said many, many times on these threads, I feel strongly
that the choice of output format shouldn't for the most part affect
the information which is displayed. Output format should be an
option, and the information to display should be an option, and the
two should be, as far as possible, separate. So what you're
suggesting is the way it works now.

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00879.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00916.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00917.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00989.php

...Robert

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: machine-readable explain output v4

On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery.  Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

I don't see anything particularly wrong with having ExplainResultDesc
do the same kind of thing GetCommandLogLevel is doing.

After I did this, I thought it would be useful to add a regression
test to make sure that it is doing the right thing. So I came up with
this:

CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$
DECLARE
x RECORD;
BEGIN
EXECUTE 'explain (format ' || $1 || ') select 1' INTO x;
RETURN pg_typeof(x."QUERY PLAN");
END
$$ LANGUAGE plpgsql;

This works the first time you run it in a particular session, but then
if change $1 so as to get a different answer, it fails:

rhaas=# select test_explain_format('text');
test_explain_format
---------------------
text
(1 row)

rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard
ALL PLANS TEMP
rhaas=# discard plans;
DISCARD PLANS
rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard all;
DISCARD ALL
rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=#

If I quit psql and start back up again, then it works:

rhaas=# select test_explain_format('xml');
test_explain_format
---------------------
xml
(1 row)

So I guess that leads me to -

(1) How do I make this work?
(2) Is it worth making this work?

...Robert

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#9)
Re: machine-readable explain output v4

Robert Haas wrote:

On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery. Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

I don't see anything particularly wrong with having ExplainResultDesc
do the same kind of thing GetCommandLogLevel is doing.

After I did this, I thought it would be useful to add a regression
test to make sure that it is doing the right thing. So I came up with
this:

CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$
DECLARE
x RECORD;
BEGIN
EXECUTE 'explain (format ' || $1 || ') select 1' INTO x;
RETURN pg_typeof(x."QUERY PLAN");
END
$$ LANGUAGE plpgsql;

This works the first time you run it in a particular session, but then
if change $1 so as to get a different answer, it fails:

rhaas=# select test_explain_format('text');
test_explain_format
---------------------
text
(1 row)

rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard
ALL PLANS TEMP
rhaas=# discard plans;
DISCARD PLANS
rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard all;
DISCARD ALL
rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=#

If I quit psql and start back up again, then it works:

rhaas=# select test_explain_format('xml');
test_explain_format
---------------------
xml
(1 row)

So I guess that leads me to -

(1) How do I make this work?
(2) Is it worth making this work?

You could have the function create an inner function which it then runs
and drops. I have had to perform such gymnastics in the past to get
around similar problems. And yes, it's ugly as hell.

cheers

andrew

#11Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#10)
Re: machine-readable explain output v4

On Wed, Aug 5, 2009 at 7:20 AM, Andrew Dunstan<andrew@dunslane.net> wrote:

Robert Haas wrote:

On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery.  Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

I don't see anything particularly wrong with having ExplainResultDesc
do the same kind of thing GetCommandLogLevel is doing.

After I did this, I thought it would be useful to add a regression
test to make sure that it is doing the right thing.  So I came up with
this:

CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$
DECLARE
       x RECORD;
BEGIN
       EXECUTE 'explain (format ' || $1 || ') select 1' INTO x;
       RETURN pg_typeof(x."QUERY PLAN");
END
$$ LANGUAGE plpgsql;

This works the first time you run it in a particular session, but then
if change $1 so as to get a different answer, it fails:

rhaas=# select test_explain_format('text');
 test_explain_format
---------------------
 text
(1 row)

rhaas=# select test_explain_format('xml');
ERROR:  type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT:  PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard
ALL    PLANS  TEMP
rhaas=# discard plans;
DISCARD PLANS
rhaas=# select test_explain_format('xml');
ERROR:  type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT:  PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard all;
DISCARD ALL
rhaas=# select test_explain_format('xml');
ERROR:  type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT:  PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=#

If I quit psql and start back up again, then it works:

rhaas=# select test_explain_format('xml');
 test_explain_format
---------------------
 xml
(1 row)

So I guess that leads me to -

(1) How do I make this work?
(2) Is it worth making this work?

You could have the function create an inner function which it then runs and
drops. I have had to perform such gymnastics in the past to get around
similar problems. And yes, it's ugly as hell.

<hurls>

Well, I guess I could do it like this:

CREATE OR REPLACE FUNCTION test_explain_format() RETURNS text[] AS $$
DECLARE
xt RECORD;
xx RECORD;
xj RECORD;
BEGIN
EXPLAIN (FORMAT TEXT) SELECT 1 INTO xt;
EXPLAIN (FORMAT XML) SELECT 1 INTO xx;
EXPLAIN (FORMAT JSON) SELECT 1 INTO xj;
RETURN ARRAY[
pg_typeof(xt."QUERY PLAN"),
pg_typeof(xx."QUERY PLAN"),
pg_typeof(xj."QUERY PLAN")
];
END
$$ LANGUAGE plpgsql;

Fortunately there is not an unlimited space to probe here...

...Robert

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: machine-readable explain output v4

Robert Haas <robertmhaas@gmail.com> writes:

(2) Is it worth making this work?

No, I don't think so. The odds of such a test ever showing anything
interesting seem minimal.

plpgsql's inability to cope with the case would be nice to fix, but
I'm not holding my breath for it...

regards, tom lane

#13Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#5)
Re: machine-readable explain output v4

On Sun, Aug 2, 2009 at 7:29 PM, Andres Freund<andres@anarazel.de> wrote:

Well, the whole explain output format is pretty idiosyncratic, and I
had to work pretty hard to beat it into submission.  I think that it
would not be totally trivial to do what you're suggesting here because
it would require adding code to manage es->indent outside of
ExplainPrintPlan(), which we currently don't.  I'm not sure whether
that works out to a net win.

Thats why I suggested doing it for JSON/XML only. E.g. like in the attached
patch. The result looks simpler for my eyes.

I looked at this some more. I think it's a mess. It's probably right
to do what you're suggesting here, but this patch only changes pieces
of it without making the whole thing consistent. report_triggers(),
for example, kludges a value into es->indent but then ignores it when
determining how much to indent <Trigger>, etc. I can't even figure
out why this works now, let alone being able to maintain it down the
line.

I'm working on trying to fix this.

...Robert

#14Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#13)
Re: machine-readable explain output v4

On Wed, Aug 5, 2009 at 10:48 PM, Robert Haas<robertmhaas@gmail.com> wrote:

On Sun, Aug 2, 2009 at 7:29 PM, Andres Freund<andres@anarazel.de> wrote:

Well, the whole explain output format is pretty idiosyncratic, and I
had to work pretty hard to beat it into submission.  I think that it
would not be totally trivial to do what you're suggesting here because
it would require adding code to manage es->indent outside of
ExplainPrintPlan(), which we currently don't.  I'm not sure whether
that works out to a net win.

Thats why I suggested doing it for JSON/XML only. E.g. like in the attached
patch. The result looks simpler for my eyes.

I looked at this some more.  I think it's a mess.  It's probably right
to do what you're suggesting here, but this patch only changes pieces
of it without making the whole thing consistent.  report_triggers(),
for example, kludges a value into es->indent but then ignores it when
determining how much to indent <Trigger>, etc.  I can't even figure
out why this works now, let alone being able to maintain it down the
line.

I'm working on trying to fix this.

Revised patch attached. I'm not convinced this is as good as it can
be, but I've been looking at this patch for so long that I'm starting
to get cross-eyed, and I'd like to Tom at least have a look at this
and assess it before we run out of CommitFest.

...Robert

Attachments:

explain_format-v5.patchtext/x-diff; charset=US-ASCII; name=explain_format-v5.patchDownload+1310-998
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: machine-readable explain output v4

Robert Haas <robertmhaas@gmail.com> writes:

Revised patch attached. I'm not convinced this is as good as it can
be, but I've been looking at this patch for so long that I'm starting
to get cross-eyed, and I'd like to Tom at least have a look at this
and assess it before we run out of CommitFest.

I'm starting to look at this now. I feel unqualified to opine on the
quality of the XML/JSON schema design, and given the utter lack of
documentation of what that design is, I'm not sure that anyone who
could comment on it has done so. Could we have a spec please?

Also, the JSON code seems a bit messy/poorly factorized. Is there
a reason for that, or just it's not as mature as the XML code?

regards, tom lane

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
Re: machine-readable explain output v4

On Sun, Aug 9, 2009 at 3:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Revised patch attached.  I'm not convinced this is as good as it can
be, but I've been looking at this patch for so long that I'm starting
to get cross-eyed, and I'd like to Tom at least have a look at this
and assess it before we run out of CommitFest.

I'm starting to look at this now.  I feel unqualified to opine on the
quality of the XML/JSON schema design, and given the utter lack of
documentation of what that design is, I'm not sure that anyone who
could comment on it has done so.  Could we have a spec please?

*scratches head*

You're not the first person to make that request, and I'd like to
respond to it to well, but I don't really know what to write. Most of
the discussion about the XML/JSON output format thus far has been
around things like whether we should downcase everything, and even the
people offering these comments have mostly labelled them with words to
the effect of "I know this is trival but...". I think that the reason
for this is that fundamentally explain output is fundamentally a tree,
and XML and JSON both have ways of representing a tree with properties
hanging off the nodes, and this patch uses those ways. I can't figure
out what else there is, so I don't know what I'm explaining why I
didn't do.

The one significant representational choice that I'm aware of having
made is to use nested tags rather than attributes in the XML format.
This seems to me to offer several advantages. First, it's clearly
impossible to standardize on attributes, because attributes can only
be text, and it seems to me that if we're going to try to output
structured data, we want to take that as far as we can, and we have
attributes (like sort keys) that are lists rather than scalars. Using
tags means that they can have substructure when needed. Second, it
seems likely to me that people will want to extend explain further in
the future: indeed, that was the whole point of the explain-options
patch which was already committed. That's pretty simple in the
current design - just add a few more calls to ExplainPropertyText or
ExplainPropertyList in the appropriate place, and you're done. I'm
pretty sure that splitting things up between attributes and nested
tags would complicate such modifications.

Peter Eisentraut, in an earlier review of this patch, complained about
the format as well, saying something along the lines of "this is
trying to be all things to all people". I don't want to dismiss that
criticism, but neither can I put my finger on the problem. In an
ideal world, we'd like to be all things to all people, but it's
usually not possible to achieve that in practice. Still, it's not
clear to me what need this wouldn't serve. It's possible to generate
the text format from the XML or JSON format, so it should be
well-suited to graphical presentation of explain output. It's also
possible to grope through the output and, say, find the average cost
of all your seqscan nodes, or verify the absence of merge joins, or
anything of that sort that someone might think that they want to do.

In a nutshell, the design is "take all the fields we have now and put
XML/JSON markup around them so they're easier to get to". Maybe
that's not enough of a design, but I don't have any other ideas.

Also, the JSON code seems a bit messy/poorly factorized.  Is there
a reason for that, or just it's not as mature as the XML code?

I wrote them together, so it's not a question of code maturity, but I
wouldn't rule out me being dumb. I'm open to suggestions... AFAICS,
the need to comma-separate list and hash elements is most of the
problem. I had thought about replacing es->needs_separator with a
list so that we could push/pop elements, but I wasn't totally sure
whether that was a good idea.

...Robert

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#16)
Re: machine-readable explain output v4

Robert Haas wrote:

The one significant representational choice that I'm aware of having
made is to use nested tags rather than attributes in the XML format.
This seems to me to offer several advantages. First, it's clearly
impossible to standardize on attributes, because attributes can only
be text, and it seems to me that if we're going to try to output
structured data, we want to take that as far as we can, and we have
attributes (like sort keys) that are lists rather than scalars. Using
tags means that they can have substructure when needed. Second, it
seems likely to me that people will want to extend explain further in
the future: indeed, that was the whole point of the explain-options
patch which was already committed. That's pretty simple in the
current design - just add a few more calls to ExplainPropertyText or
ExplainPropertyList in the appropriate place, and you're done. I'm
pretty sure that splitting things up between attributes and nested
tags would complicate such modifications.

In general, in XML one uses an attribute for a named property of an
object that can only have one value at a time. A classic example is the
dimensions of an object - it can only have one width and height.
Children (nested tags, particularly) are used for things it can have an
arbitrary number of, or things which in turn can have children. the
HTML <p> and <body> elements are (respectively) examples of these.
Generally, attribute values especially should be short - I recently saw
an example that had an entire image hex encoded in an XML attribute,
which struck me as just horrible. Enumerations, date and time values,
booleans, measurements - these are common types of attribute values.
Extracting a value from an attribute is no more or less difficult than
from a nested tag, using the XPath query language.

The XML Schema standard is a language for specifying the structure of a
given XML document type, and while it is undoubtedly complex, it is also
much more powerful than the older DTD mechanism. I think we should be
creating (and publishing) an XML Schema specification for any XML
documents we are producing. There are a number of members of the
community who are equipped to help produce these.

There is probably a good case for using an explicit namespace with such
docs. So we might have something like:

<pg:explain
xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd&quot;&gt; ....

BTW, has anyone tried validating the XML at all? I just looked very
briefly at the patch at
<http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php&gt; and
I noticed this which makes me suspicious:

+ 	if (es.format == EXPLAIN_FORMAT_XML)
+ 		appendStringInfoString(es.str,
+ 			"<explain xmlns=\"http://www.postgresql.org/2009/explain\" <http://www.postgresql.org/2009/explain%5C%22>;>\n");

That ";" after the attribute is almost certainly wrong. This is a classic case of what I was talking about a month or two ago. Building up XML (or any structured doc, really, XML is not special in this regard) by ad hoc methods is horribly error prone. if you don't want to rely on libxml, then I think you need to develop a lightweight abstraction rather than just appending to a StringInfo.

cheers

andrew

#18Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#17)
Re: machine-readable explain output v4

On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote:

Robert Haas wrote:

The one significant representational choice that I'm aware of having
made is to use nested tags rather than attributes in the XML format.
This seems to me to offer several advantages. First, it's clearly
impossible to standardize on attributes, because attributes can only
be text, and it seems to me that if we're going to try to output
structured data, we want to take that as far as we can, and we have
attributes (like sort keys) that are lists rather than scalars. Using
tags means that they can have substructure when needed. Second, it
seems likely to me that people will want to extend explain further in
the future: indeed, that was the whole point of the explain-options
patch which was already committed. That's pretty simple in the
current design - just add a few more calls to ExplainPropertyText or
ExplainPropertyList in the appropriate place, and you're done. I'm
pretty sure that splitting things up between attributes and nested
tags would complicate such modifications.

The XML Schema standard is a language for specifying the structure of a
given XML document type, and while it is undoubtedly complex, it is also
much more powerful than the older DTD mechanism. I think we should be
creating (and publishing) an XML Schema specification for any XML
documents we are producing. There are a number of members of the
community who are equipped to help produce these.

I produced/mailed a relaxng version for a a bit older version and I plan to
refresh and document it once the format seems suitably stable. I am not sure
it is yet. If yes, this should not take that long...
(Relaxng because you easily can convert it into most other XML schema
description languages)

There is probably a good case for using an explicit namespace with such
docs. So we might have something like:

<pg:explain
xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd&quot;&gt; ....

BTW, has anyone tried validating the XML at all? I just looked very
briefly at the patch at
<http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php&gt; and
I noticed this which makes me suspicious:

+ 	if (es.format == EXPLAIN_FORMAT_XML)
+ 		appendStringInfoString(es.str,
+ 			"<explain xmlns=\"http://www.postgresql.org/2009/explain\"
<http://www.postgresql.org/2009/explain%5C%22>;>\n");

That bug is fixed - as referenced above I wrote a schema and validated it. So,
yes, the generated XML was valid at least before the last round of
refactoring. And I looked through the output quite a bit so I would surprised
if there is such a breakage.

That ";" after the attribute is almost certainly wrong. This is a classic
case of what I was talking about a month or two ago. Building up XML (or
any structured doc, really, XML is not special in this regard) by ad hoc
methods is horribly error prone. if you don't want to rely on libxml, then
I think you need to develop a lightweight abstraction rather than just
appending to a StringInfo.

Actually by now a non-insignificant portion already "outsources" this - only
some special cases (empty attributes, no newlines wanted, initial element with
namespace) do not do this.

While it would be possible to add another step inbetween and generate a format
neutral tree and generate the different formats out of it I am not sure that
this is worthwile.
The current text format will need to stay special cased anyway because its far
to inconsistent to generate it from anything abstract and I don't see any
completely new formats coming (i.e. not just optional parts)?

Andres

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#18)
Re: machine-readable explain output v4

Andres Freund wrote:

BTW, has anyone tried validating the XML at all? I just looked very
briefly at the patch at
<http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php&gt; and
I noticed this which makes me suspicious:

+ 	if (es.format == EXPLAIN_FORMAT_XML)
+ 		appendStringInfoString(es.str,
+ 			"<explain xmlns=\"http://www.postgresql.org/2009/explain\"
<http://www.postgresql.org/2009/explain%5C%22>;>\n");

That bug is fixed - as referenced above I wrote a schema and validated it. So,
yes, the generated XML was valid at least before the last round of
refactoring. And I looked through the output quite a bit so I would surprised
if there is such a breakage.
then someone needs to update the commitfest page with the lastest patch. That's the link I followed.

Hmm. I wonder how i got the link to an old version of the patch.

Anyway, I'm glad it's fixed.

cheers

andrew

#20Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#19)
Re: machine-readable explain output v4

Andrew Dunstan wrote:

Andres Freund wrote:

BTW, has anyone tried validating the XML at all? I just looked very
briefly at the patch at
<http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php&gt; and
I noticed this which makes me suspicious:

+     if (es.format == EXPLAIN_FORMAT_XML)
+         appendStringInfoString(es.str,
+             "<explain 
xmlns=\"http://www.postgresql.org/2009/explain\"
<http://www.postgresql.org/2009/explain%5C%22>;>\n");

That bug is fixed - as referenced above I wrote a schema and
validated it. So, yes, the generated XML was valid at least before
the last round of refactoring. And I looked through the output quite
a bit so I would surprised if there is such a breakage.
then someone needs to update the commitfest page with the lastest
patch. That's the link I followed.

Hmm. I wonder how i got the link to an old version of the patch.

Anyway, I'm glad it's fixed.

I takle it back. It's still there at
<http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php&gt;
posted 3 days ago.

cheers

andrew

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#20)
#23Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#20)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#20)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#21)
#27Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#21)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#29)
#31Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#26)
#32Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#30)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
#34Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#24)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#33)
#36Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#18)
#37Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#36)
#38Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#37)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#39)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#40)
#42Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#22)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#41)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#43)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#49)
#51Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#50)
#52Mike Benoit
ipso@snappymail.ca
In reply to: Robert Haas (#51)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mike Benoit (#52)
#54Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#53)
#55Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#54)
#56Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#54)
#57Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#56)
#58Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#56)
#59Csaba Nagy
nagy@ecircle-ag.com
In reply to: Andrew Dunstan (#58)
#60Andrew Dunstan
andrew@dunslane.net
In reply to: Csaba Nagy (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#60)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#60)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#62)
#64Csaba Nagy
nagy@ecircle-ag.com
In reply to: Andrew Dunstan (#60)
#65Csaba Nagy
nagy@ecircle-ag.com
In reply to: Csaba Nagy (#64)
#66Andrew Dunstan
andrew@dunslane.net
In reply to: Csaba Nagy (#64)
#67Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrew Dunstan (#60)
#68Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#67)
#69Csaba Nagy
nagy@ecircle-ag.com
In reply to: Andrew Dunstan (#66)
#70Mike Benoit
ipso@snappymail.ca
In reply to: Andrew Dunstan (#60)
#71Csaba Nagy
nagy@ecircle-ag.com
In reply to: Csaba Nagy (#69)
#72Andrew Dunstan
andrew@dunslane.net
In reply to: Csaba Nagy (#69)
#73Csaba Nagy
nagy@ecircle-ag.com
In reply to: Andrew Dunstan (#72)
#74Andrew Dunstan
andrew@dunslane.net
In reply to: Csaba Nagy (#69)
#75Csaba Nagy
nagy@ecircle-ag.com
In reply to: Andrew Dunstan (#74)
#76Greg Smith
gsmith@gregsmith.com
In reply to: Mike Benoit (#52)