XML - DOCTYPE element - documentation suggestion

Started by Craig Ringeralmost 16 years ago2 messagesgeneral
Jump to latest
#1Craig Ringer
craig@2ndquadrant.com

Hi all

I've been working with XML storage in Pg and was puzzled by the fact
that Pg appears to refuse to store a document with a DOCTYPE declaration
- it was interpreting it as a regular element and rejecting it.

This turns out to be because Pg parses XML as a fragment (ie option
CONTENT) when casting, and XML fragments cannot have a doctype.
Unfortunately the error is ... unhelpful ... and the documentation
neglects to mention this issue. Hence my post.

I didn't see anything about this in the FAQ or in the docs for the XML
datatype
(http://www.postgresql.org/docs/current/interactive/datatype-xml.html)
and was half-way through writing this post when I found a helpful
message on the list:

http://www.mail-archive.com/pgsql-general@postgresql.org/msg119713.html

that hinted the way. Even then it took me a while to figure out that you
can't specify DOCUMENT or CONTENT on the XML type its self, but must
specify it while parsing instead and use a CHECK constraint if you want
to require storage of whole documents in a field.

The xml datatype documentation should probably mention that whole
documents must be loaded with an XMLPARSE(DOCUMENT 'doc_text_here), they
cannot just be cast from text to xml as happens when you pass an xml
document as text to a parameter during an INSERT. This should probably
appear under "CREATING XML VALUES" in:

http://www.postgresql.org/docs/current/static/datatype-xml.html

... and probably deserves mention in a new "CAVEATS" or "NOTES" section
too, as it' *will* catch people out even if they R TFM.

I'd expect this to work:

CREATE TABLE test_xml ( doc xml );

INSERT INTO test_xml ( doc ) VALUES (
$$<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test SYSTEM 'test.dtd'><test>dummy content</test>$$
);

... but it fails with:

ERROR: invalid XML content
LINE 2: $$<?xml version="1.0" encoding="utf-8"?>
^
DETAIL: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE test SYSTEM 'test.dtd'><test>dummy content</test>
^

though xmllint (from libxml) is quite happy with the document. This had
me quite confused for a while.

--
Craig Ringer

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Craig Ringer (#1)
Re: XML - DOCTYPE element - documentation suggestion

On fre, 2010-06-18 at 02:43 +0800, Craig Ringer wrote:

The xml datatype documentation should probably mention that whole
documents must be loaded with an XMLPARSE(DOCUMENT 'doc_text_here),
they
cannot just be cast from text to xml as happens when you pass an xml
document as text to a parameter during an INSERT. This should probably
appear under "CREATING XML VALUES" in:

http://www.postgresql.org/docs/current/static/datatype-xml.html

... and probably deserves mention in a new "CAVEATS" or "NOTES"
section
too, as it' *will* catch people out even if they R TFM.

Done