XML import with DTD

Started by Roy Walteralmost 17 years ago5 messagesgeneral
Jump to latest
#1Roy Walter
walt@brookhouse.co.uk

Hi

I'm trying to use the XPath functionality of Postgres.

I can populate a text field (unparsed) with XML data but as far as I can
see the xpath() function [now] only works on the xml data type.

When I try to populate a text field with XML data containing a DTD,
however, the parser chokes. If I strip the DTD the parser chokes on
undefined entities which are defined in the DTD.

(I switched the app' to from MySQL to Postgres because while MySQL works
it returns matches in undelimited form which is next to useless if, for
example, you return multiple attributes from a node.)

Does anyone know of a solution to this problem?

Windows 2000 Server
Postgres 8.4

Regards
Roy Walter

#2Scott Bailey
artacus@comcast.net
In reply to: Roy Walter (#1)
Re: XML import with DTD

Post a snippet of the xml and xpath you are trying to use.

Scott

----- Original Message -----
From: "Roy Walter" <walt@brookhouse.co.uk>
To: pgsql-general@postgresql.org
Sent: Friday, July 10, 2009 7:49:00 AM GMT -08:00 US/Canada Pacific
Subject: [GENERAL] XML import with DTD

Hi

I'm trying to use the XPath functionality of Postgres.

I can populate a text field (unparsed) with XML data but as far as I can see the xpath() function [now] only works on the xml data type.

When I try to populate a text field with XML data containing a DTD, however, the parser chokes. If I strip the DTD the parser chokes on undefined entities which are defined in the DTD.

(I switched the app' to from MySQL to Postgres because while MySQL works it returns matches in undelimited form which is next to useless if, for example, you return multiple attributes from a node.)

Does anyone know of a solution to this problem?

Windows 2000 Server
Postgres 8.4

Regards
Roy Walter

#3Roy Walter
walt@brookhouse.co.uk
In reply to: Scott Bailey (#2)
Re: XML import with DTD

It's not an xpath problem it's an XML import problem. Sorry if I wasn't
clear.

Consider the following example queries. This one works fine:

INSERT INTO wms_collection (docxml) VALUES (XMLPARSE(content(
'<?xml version="1.0" encoding="ISO-8859-1"?>
<shop>
<product>Shoes</product>
</shop>')))

This one does not:

INSERT INTO wms_collection (docxml) VALUES (XMLPARSE(content(
'<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE publicwhip
[
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
]>
<shop>
<product>Shoes</product>
</shop>')))

Both are valid XML but the second query fails as follows:

ERROR: invalid XML content
DETAIL: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE publicwhip
^
Entity: line 4: parser error : StartTag: invalid element name
<!ENTITY ndash "&#8211;">
^
Entity: line 5: parser error : StartTag: invalid element name
<!ENTITY mdash "&#8212;">

-- Roy

artacus@comcast.net wrote:

Show quoted text

Post a snippet of the xml and xpath you are trying to use.

Scott

----- Original Message -----
From: "Roy Walter" <walt@brookhouse.co.uk>
To: pgsql-general@postgresql.org
Sent: Friday, July 10, 2009 7:49:00 AM GMT -08:00 US/Canada Pacific
Subject: [GENERAL] XML import with DTD

Hi

I'm trying to use the XPath functionality of Postgres.

I can populate a text field (unparsed) with XML data but as far as I
can see the xpath() function [now] only works on the xml data type.

When I try to populate a text field with XML data containing a DTD,
however, the parser chokes. If I strip the DTD the parser chokes on
undefined entities which are defined in the DTD.

(I switched the app' to from MySQL to Postgres because while MySQL
works it returns matches in undelimited form which is next to useless
if, for example, you return multiple attributes from a node.)

Does anyone know of a solution to this problem?

Windows 2000 Server
Postgres 8.4

Regards
Roy Walter
------------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.387 / Virus Database: 270.13.9/2229 - Release Date: 07/10/09 07:05:00

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roy Walter (#3)
Re: XML import with DTD

Roy Walter <walt@brookhouse.co.uk> writes:

This one does not:

INSERT INTO wms_collection (docxml) VALUES (XMLPARSE(content(
'<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE publicwhip
[
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
]>
<shop>
<product>Shoes</product>
</shop>')))

What I know about XML wouldn't fill a thimble, but shouldn't you say
DOCUMENT not CONTENT if you are trying to provide a complete document?
Doing that seems to make this work without error.

The fine manual states near the bottom of 8.13.1
http://www.postgresql.org/docs/8.4/static/datatype-xml.html
that CONTENT is less restrictive than DOCUMENT, but at least for
this specific point that seems not to be true.

regards, tom lane

#5Roy Walter
walt@brookhouse.co.uk
In reply to: Tom Lane (#4)
Re: XML import with DTD

Doh! That's it. Thanks a million.

-- Roy

Tom Lane wrote:

Show quoted text

Roy Walter <walt@brookhouse.co.uk> writes:

This one does not:

INSERT INTO wms_collection (docxml) VALUES (XMLPARSE(content(
'<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE publicwhip
[
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
]>
<shop>
<product>Shoes</product>
</shop>')))

What I know about XML wouldn't fill a thimble, but shouldn't you say
DOCUMENT not CONTENT if you are trying to provide a complete document?
Doing that seems to make this work without error.

The fine manual states near the bottom of 8.13.1
http://www.postgresql.org/docs/8.4/static/datatype-xml.html
that CONTENT is less restrictive than DOCUMENT, but at least for
this specific point that seems not to be true.

regards, tom lane
------------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.387 / Virus Database: 270.13.10/2231 - Release Date: 07/11/09 05:57:00