Fwd: Question about xmloption and pg_restore

Started by Stefan Fercotover 7 years ago5 messages
#1Stefan Fercot
Stefan Fercot
stefan.fercot@dalibo.com

Hi all,

I've got some question about XML OPTION and pg_restore.

In a xml type column are stored documents.

When restoring a plain SQL dump, we got the message :

/ERROR:  invalid XML content//
//DETAIL:  line 1: StartTag: invalid element name//
//<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd&#39; 'SYSTEM'>//
// ^//
//CONTEXT:  COPY layer_styles, line 1, column styleqml: "<!DOCTYPE qgis
PUBLIC 'http://mrcc.com/qgis.dtd&#39; 'SYSTEM'>/

Adding "/SET XML OPTION DOCUMENT;/" in top of the dump file allows to
restore it.

Now, with the "custom" format, we have to use before pg_restore :
/export PGOPTIONS="-c xmloption=DOCUMENT"/.

Do you think of any other way to solve this issue ?

What if we got the 2 xml options used in the database?

Would it be possible to have something like : ALTER TABLE ... ALTER
COLUMN ... SET XML OPTION ...; ?

Kind regards,

--
Stefan FERCOT
http://dalibo.com - http://dalibo.org

#2Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Stefan Fercot (#1)
Re: Question about xmloption and pg_restore

On Thu, May 17, 2018 at 9:37 AM, Stefan Fercot <stefan.fercot@dalibo.com> wrote:

When restoring a plain SQL dump, we got the message :

ERROR: invalid XML content
DETAIL: line 1: StartTag: invalid element name
<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd&#39; 'SYSTEM'>
^
CONTEXT: COPY layer_styles, line 1, column styleqml: "<!DOCTYPE qgis PUBLIC
'http://mrcc.com/qgis.dtd&#39; 'SYSTEM'>

Adding "SET XML OPTION DOCUMENT;" in top of the dump file allows to restore
it.

Hmm. I thought that xmloption = 'content' was supposed to be strictly
more permissive than xmloption = 'document'.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Chapman Flack
Chapman Flack
chap@anastigmatix.net
In reply to: Robert Haas (#2)
Re: Re: Question about xmloption and pg_restore

On 05/18/18 15:50, Robert Haas wrote:

On Thu, May 17, 2018 at 9:37 AM, Stefan Fercot <stefan.fercot@dalibo.com> wrote:

ERROR: invalid XML content
DETAIL: line 1: StartTag: invalid element name
<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd&#39; 'SYSTEM'>
^
CONTEXT: COPY layer_styles, line 1, column styleqml: "<!DOCTYPE qgis PUBLIC
'http://mrcc.com/qgis.dtd&#39; 'SYSTEM'>

Adding "SET XML OPTION DOCUMENT;" in top of the dump file allows to restore
it.

Hmm. I thought that xmloption = 'content' was supposed to be strictly
more permissive than xmloption = 'document'.

In the spirit of not leaving a good question hanging, this turns out to be
a difference between the 2003 SQL/XML standard (which PG implements) and
the later versions, which changed the data model so there really is a
containment relationship between 'content' and 'document'.

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION

-Chap

#4Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chapman Flack (#3)
Re: Question about xmloption and pg_restore

Chapman Flack <chap@anastigmatix.net> writes:

On 05/18/18 15:50, Robert Haas wrote:

Hmm. I thought that xmloption = 'content' was supposed to be strictly
more permissive than xmloption = 'document'.

In the spirit of not leaving a good question hanging, this turns out to be
a difference between the 2003 SQL/XML standard (which PG implements) and
the later versions, which changed the data model so there really is a
containment relationship between 'content' and 'document'.
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION

See also
/messages/by-id/153478795159.1302.9617586466368699403@wrigleys.postgresql.org

It's odd that people are just reporting this now when it's been like that
for quite a few years, but anyway we've got a problem. Sounds like maybe
adopting the later standards' definitions would fix it? Although I have
no idea how complicated that'd be.

regards, tom lane

#5Chapman Flack
Chapman Flack
chap@anastigmatix.net
In reply to: Tom Lane (#4)
Re: Question about xmloption and pg_restore

On 10/25/18 05:02, Tom Lane wrote:

Chapman Flack <chap@anastigmatix.net> writes:

a difference between the 2003 SQL/XML standard (which PG implements) and
the later versions, which changed the data model so there really is a
containment relationship between 'content' and 'document'.
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION

See also
/messages/by-id/153478795159.1302.9617586466368699403@wrigleys.postgresql.org

It's odd that people are just reporting this now when it's been like that
for quite a few years, but anyway we've got a problem. Sounds like maybe
adopting the later standards' definitions would fix it? Although I have
no idea how complicated that'd be.

Supporting the later standards entirely would be a commendable thing,
but honest work:

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Possible_ways_forward

OTOH, making the current XML parsing not fail in this particular case
(which could be viewed as adopting the later standards' relationship
of CONTENT to DOCUMENT) might just be as simple as having the current
parsing code for CONTENT detect whether the string "starts with" a
<!DOCTYPE and fall back to the existing parsing code for DOCUMENT
if it does.

... where "starts with" actually means "possibly following some
whitespace, comments, or PIs, but you can stop looking if you see
a start-element", so essentially a port to C of:

https://github.com/tada/pljava/blob/V1_5_1/pljava/src/main/java/org/postgresql/pljava/jdbc/SQLXMLImpl.java#L409

which decides whether the input should be passed straight to the DOCUMENT-
style parser or somehow treated specially to parse as CONTENT. In Java
the special treatment involves a wrapping element, in xml.c it involves
calling a different libxml2 function, xmlParseBalancedChunkMemory, but
the choice of which method to apply is the same choice.

IIRC, XML comments don't nest, so it may be that "possibly following
some whitespace, comments, or PIs" could be shown to be a regular language,
and checked with a regex. I did it the more explicit way in Java for
clarity, and because the API was there, and so I wouldn't have to think
about it.

-Chap