Add GUC to enable libxml2's XML_PARSE_HUGE

Started by Jim Jones8 months ago6 messageshackers
Jump to latest
#1Jim Jones
jim.jones@uni-muenster.de

Hi,

In commit 71c0921 we re-introduced use of xmlParseBalancedChunkMemory in
order to allow parsing of large XML documents with certain libxml2
versions [1]/messages/by-id/a8771e75-60ee-4c99-ae10-ca4832e1ec8d@uni-muenster.de. While that solved a regression issue, it still leaves the
handling of very large or deeply nested XML documents tied to libxml2’s
internal limits and behaviuor.

To address this, Erik and I would like to propose a new GUC,
xml_parse_huge, which controls libxml2’s XML_PARSE_HUGE option. This
makes the handling of large XML documents explicit and independent of
libxml2 version quirks. The new predefined role pg_xml_parse_huge allows
superusers to grant session-level use of this option without granting
full superuser rights, so DBAs can flexibly delegate the capability in a
controlled manner.

Examples:

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE USER u1;
CREATE ROLE
postgres=# CREATE DATABASE db OWNER u1;
CREATE DATABASE
postgres=# \q

# By default a user cannot set this parameter and the default value is 'off'

$ /usr/local/postgres-dev/bin/psql -d db -U u1
psql (19devel)
Type "help" for help.

db=> SHOW xml_parse_huge;
 xml_parse_huge
----------------
 off
(1 row)

db=> SET xml_parse_huge TO on;
ERROR:  permission denied to set parameter "xml_parse_huge"
HINT:  You must be a superuser or a member of the "pg_xml_parse_huge"
role to set this option.

db=> ALTER SYSTEM SET xml_parse_huge TO on;
ERROR:  permission denied to set parameter "xml_parse_huge"

# This leads libxml2 to raise an error for text nodes exceeding
XML_MAX_TEXT_LENGTH

db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) ||
'</root>')::xml;
ERROR:  invalid XML content
DETAIL:  line 1: Resource limit exceeded: Text node too long, try
XML_PARSE_HUGE
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

# The role pg_xml_parse_huge allows the user to set the new parameter

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# GRANT pg_xml_parse_huge TO u1;
GRANT ROLE
postgres=# \q

$ /usr/local/postgres-dev/bin/psql -d db -U u1
psql (19devel)
Type "help" for help.

db=> SET xml_parse_huge TO on;
SET
db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) ||
'</root>')::xml;
SELECT 1

# It is also possible to enable this feature by default for a user

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE USER u2;
CREATE ROLE
postgres=# GRANT pg_xml_parse_huge TO u2;
GRANT ROLE
postgres=# ALTER USER u2 SET xml_parse_huge TO on;
ALTER ROLE
postgres=# \q

$ /usr/local/postgres-dev/bin/psql -d db -U u2
psql (19devel)
Type "help" for help.

db=> SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 on
(1 row)

# A superuser can enable this feature for a whole database (or the whole
cluster via postgresql.conf):

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE DATABASE db2;
CREATE DATABASE
postgres=# ALTER DATABASE db2 SET xml_parse_huge TO on;
ALTER DATABASE
postgres=# SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 off
(1 row)

postgres=# \c db2
You are now connected to database "db2" as user "jim".
db2=# SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 on
(1 row)

Attached is a first draft.

* I'm CC'ing Tom and Michael since they were involved in the earlier
discussion.

Initially we considered creating a second GUC instead of a role, but
decided that would be confusing and less manageable than having a single
GUC with role-based delegation.

Any thoughts or comments?

[1]: /messages/by-id/a8771e75-60ee-4c99-ae10-ca4832e1ec8d@uni-muenster.de
/messages/by-id/a8771e75-60ee-4c99-ae10-ca4832e1ec8d@uni-muenster.de

Best regards, Jim

Attachments:

v1-0001-Add-GUC-to-enable-libxml2-s-XML_PARSE_HUGE-option.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-GUC-to-enable-libxml2-s-XML_PARSE_HUGE-option.patchDownload+300-29
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Jones (#1)
Re: Add GUC to enable libxml2's XML_PARSE_HUGE

Jim Jones <jim.jones@uni-muenster.de> writes:

To address this, Erik and I would like to propose a new GUC,
xml_parse_huge, which controls libxml2’s XML_PARSE_HUGE option.

Given the spotty security history of libxml2, I can't really see
how this wouldn't be enormously unsafe. Even as a superuser-only
option, it seems like a bad idea.

Independently of that, we have learned the hard way that GUCs
that change application-visible query semantics are a bad idea.
You cannot really argue that this wouldn't be one.

regards, tom lane

#3Jim Jones
jim.jones@uni-muenster.de
In reply to: Tom Lane (#2)
Re: Add GUC to enable libxml2's XML_PARSE_HUGE

Hi Tom

Thanks for replying so quickly!

On 20.08.25 17:46, Tom Lane wrote:

Given the spotty security history of libxml2, I can't really see
how this wouldn't be enormously unsafe. Even as a superuser-only
option, it seems like a bad idea.

I was under the impression that the status quo already exposes
PostgreSQL to these risks silently, depending on the libxml2
(xmlParseBalancedChunkMemory) version in use. Our rationale was that
making the feature explicit would at least render it visible, auditable,
and delegable by superusers. Of course, the documentation should clearly
warn about the risks.

Independently of that, we have learned the hard way that GUCs
that change application-visible query semantics are a bad idea.
You cannot really argue that this wouldn't be one.

I share these concerns about changing query semantics through GUCs, but
I thought this case might not be so different from xmloption:

postgres=# SET xmloption TO document;
SET
postgres=# SELECT 'hello'::xml;
ERROR:  invalid XML document
LINE 1: SELECT 'hello'::xml;
               ^
DETAIL:  line 1: Start tag expected, '<' not found
hello
^
postgres=# SET xmloption TO content;
SET
postgres=# SELECT 'hello'::xml;
  xml  
-------
 hello
(1 row)

Would you see any other way, other than a GUC, to provide this feature?

Thanks!

Best, Jim

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#2)
Re: Add GUC to enable libxml2's XML_PARSE_HUGE

On Wed, Aug 20, 2025 at 11:46:11AM -0400, Tom Lane wrote:

Independently of that, we have learned the hard way that GUCs
that change application-visible query semantics are a bad idea.
You cannot really argue that this wouldn't be one.

For reference: a famous example of that is autocommit as far as I
recall, with commit f85f43dfb5b9.
--
Michael

#5Erik Wienhold
ewie@ewie.name
In reply to: Jim Jones (#3)
Re: Add GUC to enable libxml2's XML_PARSE_HUGE

On 2025-08-20 21:42 +0200, Jim Jones wrote:

On 20.08.25 17:46, Tom Lane wrote:

Independently of that, we have learned the hard way that GUCs that
change application-visible query semantics are a bad idea. You
cannot really argue that this wouldn't be one.

I totally forgot about that stance. Is this only about adding new GUCs?
Because there are existing GUCs that change semantics, e.g. xmlbinary,
check_function_bodies. I guess there's a trade-off between usefulness
and being error-prone/surprising to the user (POLA).

I share these concerns about changing query semantics through GUCs,
but I thought this case might not be so different from xmloption:

postgres=# SET xmloption TO document;
SET
postgres=# SELECT 'hello'::xml;
ERROR:  invalid XML document
LINE 1: SELECT 'hello'::xml;
               ^
DETAIL:  line 1: Start tag expected, '<' not found
hello
^
postgres=# SET xmloption TO content;
SET
postgres=# SELECT 'hello'::xml;
  xml  
-------
 hello
(1 row)

I guess the excuse for xmloption is that the SQL standard defines SET
XML OPTION.

Would you see any other way, other than a GUC, to provide this
feature?

Forking off the core XML code into an extension to provide a custom xml
data type with the desired parsing behavior? :(

--
Erik Wienhold

#6Jim Jones
jim.jones@uni-muenster.de
In reply to: Erik Wienhold (#5)
Re: Add GUC to enable libxml2's XML_PARSE_HUGE

On 21.08.25 04:26, Erik Wienhold wrote:

I guess the excuse for xmloption is that the SQL standard defines SET
XML OPTION.

I guess you're right... in this case the standard dictates what should
be done. It just doesn't change the fact that depending on the parameter
value the same query succeeds or fails, which is pretty much what would
happen with xml_parse_huge.

Would you see any other way, other than a GUC, to provide this
feature?

Forking off the core XML code into an extension to provide a custom xml
data type with the desired parsing behavior? :(

That could work. I'm just afraid it would be too much trouble just to
enable text nodes larger than 10MB.

Regards, Jim