XML schemas and PG column names
I'm doing some work with the output of query_to_xml_and_xmlschema(). The
output is a bit unfortunate in my opinion when the column names in the
query are not legal XML names. We quite reasonably translate the names
so that legal XML names result, but we don't actually put the original
name anywhere in the output, meaning that the end processor has some
work to do to recover the original. Here are two snippets from the
output when the column names are "z" and "25 %ile":
<xsd:complexType name="RowType">
<xsd:sequence>
<xsd:element name="z" type="INTEGER" nillable="true"></xsd:element>
<xsd:element name="_x0032_5_x0020__x0025_ile" type="INTEGER"
nillable="true"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<row>
<z>1</z>
<_x0032_5_x0020__x0025_ile>2</_x0032_5_x0020__x0025_ile>
</row>
Of course, we can recover the original name by using something like perl
to do operations like this:
$column_name =~ s/_x([[:xdigit:]]{4})_/pack("U",hex($1))/ge;
but that's ugly and not as simply available in many XSL processors (I am
using XSL to transform the XML.)
I propose that we annotate the schema section RowType elements with the
original names, so we would have something like this in the schema section:
<xsd:complexType name="RowType"
xmlns:pg="http://www.postgresql.org/schemas/column-names">
<xsd:sequence>
<xsd:element name="z" type="INTEGER" nillable="true">
<xsd:annotation>
<xsd:appinfo>
<pg:column-name>z</pg:column-name>
</xsd:appinfo>
</xsd:annotation>*
* </xsd:element>
<xsd:element name="_x0032_5_x0020__x0025_ile" type="INTEGER"
nillable="true">
<xsd:annotation>
<xsd:appinfo>
<pg:column-name>25 %ile</pg:column-name>
</xsd:appinfo>
* *</xsd:annotation>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
While it might be a bit longwinded, it's not going to add to the per-row
output, just the schema section.
Thoughts?
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
I propose that we annotate the schema section RowType elements with the
original names, so we would have something like this in the schema section:
1. Is that legal per the SQL/XML spec?
2. What happens when the column name contains characters that would have
to be escaped, such as "<" --- haven't you just replaced one de-escaping
problem with another?
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I propose that we annotate the schema section RowType elements with the
original names, so we would have something like this in the schema section:1. Is that legal per the SQL/XML spec?
It is certainly legal per XML and XSD specs, and the SQL/XML spec has
annotations using appinfo elements. It would be rather surprising if the
SQL/XML spec forbade annotations such as I propose. The spec is
mind-bogglingly impenetrable, though. Perhaps Peter or Nicholas might know.
2. What happens when the column name contains characters that would have
to be escaped, such as "<" --- haven't you just replaced one de-escaping
problem with another?
No. say the name is "foo & bar < baz". Then the annotation would be:
<pg:column-name>foo & bar < baz</pg:column-name>
But the difference is that the XML processor will automatically unescape
this value (and re-escape it on output if necessary). The user won't
have to do anything (or shouldn't if their XML processor is worth
anything at all). So in a stylesheet, I'd be able to do something like:
<xsl:for-each select="//[complexType[@name="RowType"]//pg:column-name">
<th><xsl:value-of select="." /></th>
</xsl:for-each>
and it would just Do The Right Thing. (If we didn't want the output
re-escaped, say when the otuput format was not XML or HTML, we could
make it do that too).
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
2. What happens when the column name contains characters that would have
to be escaped, such as "<" --- haven't you just replaced one de-escaping
problem with another?
But the difference is that the XML processor will automatically unescape
this value (and re-escape it on output if necessary). The user won't
have to do anything (or shouldn't if their XML processor is worth
anything at all).
OK, so your argument is that this is a standard escaping rule and the
one in the SQL standard is, um, not standard. I wonder why the SQL
committee felt compelled to invent their own, then?
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
2. What happens when the column name contains characters that would have
to be escaped, such as "<" --- haven't you just replaced one de-escaping
problem with another?But the difference is that the XML processor will automatically unescape
this value (and re-escape it on output if necessary). The user won't
have to do anything (or shouldn't if their XML processor is worth
anything at all).OK, so your argument is that this is a standard escaping rule and the
one in the SQL standard is, um, not standard. I wonder why the SQL
committee felt compelled to invent their own, then?
They are two different things. An XML-escaped text value is by no means
necessarily a legal XML tag name (e.g. an XML name can't have spaces).
Possibly what they really did wrong was to try to map SQL column names
to XML tags at all. It might have been better to do something like:
<column name="foo & bar">some value</column>
instead of what we produce, which I assume is in the standard:
<foo_x0020__x0026__x0020_bar>somevalue</foo_x0020__x0026__x0020_bar>
which I think is just plain ugly. OTOH, then it would have been far
harder (maybe impossible) to create an XML schema for such a mechanism,
so I assume that's why they did it this way.
Anyway, It would be nice to have a way of providing the non-mangled
names - I think what I have suggested should meet the case.
cheers
andrew
On lör, 2009-12-12 at 11:51 -0500, Andrew Dunstan wrote:
It is certainly legal per XML and XSD specs, and the SQL/XML spec has
annotations using appinfo elements. It would be rather surprising if
the
SQL/XML spec forbade annotations such as I propose. The spec is
mind-bogglingly impenetrable, though. Perhaps Peter or Nicholas might
know.
I think we can of course add our own annotations. It would be good to
go through the SQL/XML standard document and check what style they use
for their annotations so that we can structure and name ours similarly
and have room for future work, in case someone also wants annotations
for table names, schema names, etc. (Or was that part of your project as
well?)
Peter Eisentraut wrote:
On lör, 2009-12-12 at 11:51 -0500, Andrew Dunstan wrote:
It is certainly legal per XML and XSD specs, and the SQL/XML spec has
annotations using appinfo elements. It would be rather surprising if
the
SQL/XML spec forbade annotations such as I propose. The spec is
mind-bogglingly impenetrable, though. Perhaps Peter or Nicholas might
know.I think we can of course add our own annotations. It would be good to
go through the SQL/XML standard document and check what style they use
for their annotations so that we can structure and name ours similarly
and have room for future work, in case someone also wants annotations
for table names, schema names, etc. (Or was that part of your project as
well?)
Well, the standard has an element specifically for annotations
concerning certain objects: sqlxml:sqlname. However, I am not sure if it
can be used in this context. Can you try reading the standard
(<http://www.sqlx.org/SQL-XML-documents/5FCD-14-XML-2004-07.pdf>) and
tell me? :-) If it's not comprehended by the standard then we should at
least use a different namespace, and probably a different element name.
The style can be made to match, though.
I certainly think we could do more of this, although the column names
are what matter to me right now. We can also do it bit by bit.
cheers
andrew