WIP - xmlvalidate implementation from TODO list
Hello!
I am likely one of the few people still using XML, but I noticed XSD schema
validation is still a TODO on postgresql, which I have some personal use
cases for.
In this patch I attempt to implement XMLVALIDATE with the already in use
libxml following a version I got my hands on of the SQL/XML standard of
2016.
In short, I had to add the ACCORDING word to comply with it and completely
ignored the version that was already in the src that fetches arbitrary
schemas (it refers to validations of dtds, which is another more
troublesome TODO).
I had problems running the regression tests on my machine, so I could only
test the feature by spawning a modified instance of postgresql and issuing
queries through psql, therefore I am marking it as WIP. If anyone can
assert the tests pass, I would be glad.
Also, this is my first patch, so I might have not followed standard
practices as best as I could, so please pay particular attention to that on
review.
Cheers,
Marcos Magueta.
Attachments:
xml-patchapplication/octet-stream; name=xml-patchDownload+546-4
On Sun, 7 Dec 2025 at 04:38, Marcos Magueta <maguetamarcos@gmail.com> wrote:
Hello!
I am likely one of the few people still using XML, but I noticed XSD schema validation is still a TODO on postgresql, which I have some personal use cases for.
In this patch I attempt to implement XMLVALIDATE with the already in use libxml following a version I got my hands on of the SQL/XML standard of 2016.
In short, I had to add the ACCORDING word to comply with it and completely ignored the version that was already in the src that fetches arbitrary schemas (it refers to validations of dtds, which is another more troublesome TODO).
I had problems running the regression tests on my machine, so I could only test the feature by spawning a modified instance of postgresql and issuing queries through psql, therefore I am marking it as WIP. If anyone can assert the tests pass, I would be glad.
Also, this is my first patch, so I might have not followed standard practices as best as I could, so please pay particular attention to that on review.
Cheers,
Marcos Magueta.
HI!
1)
+ // Default case since nothing got returned + // out of the normal path for validation calls to libxml
PostgreSQL uses /**/ comments style.
2)
XML regression test suite fails, see attached. By the way, what are
your issues with running `make check` ?
3)
By the way, in [0]https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards we have this
`
The function in PostgreSQL produces an “unimplemented” error, because
PostgreSQL does not have any implementation of the mechanism assumed
in the standard for registering schemas in advance, which is necessary
to address the security implications of a function that could refer to
schemas from arbitrary locations.
`
How does your patch resolve this? I did not find any change in this area
4)
Also I want to mention that without a doc, the patch is not in a
commitable shape.
5) I am a bit surprised by this grammar rule:
XMLVALIDATE '(' document_or_content a_expr ACCORDING TO XMLSCHEMA a_expr ')'
this allow a wide class of expressions accepted by parser, like
`SELECT xmlvalidate(DOCUMENT (select oid from pg_class) ACCORDING TO
XMLSCHEMA (select 32)) AS is_valid FROM xml_validation_test;`
Is this expected? a_expr is way more than string constants and column
references.. If yes, the regression test that you added, does not
test this..
p.s. I failed to find in google SQL/XML standard of 2016. So, I cannot
double-check if this feature is compliant with it...
[0]: https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards
--
Best regards,
Kirill Reshke
Attachments:
xml_diff.diffsapplication/octet-stream; name=xml_diff.diffsDownload+1-8
Thank you for your kind review!
Before I continue with the implementation, I would like to address
your concerns and discuss it further and see if it's worth carrying
on.
1) Will do!
2) The issue is the following:
```
./../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20
--schedule=./parallel_schedule
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# could not exec "sh": No such file or directory
Bail out!# postmaster failed, examine
```
This is likely due to my setup on Nix. If any command assumes paths on
conventional *nix I am often in trouble. I am checking that with a
friend, but any insights are welcome. The out file I generated was
while being completely blindfolded.
By the way, the diff you sent is assuming global paths for some
reason, so I couldn't apply it without manually changing them.
3) From what I understand, that refers to ISO/IEC 9075-14:2016, chapter and
section 11.6, page 245:
- Parts of the grammar that reference an URI: <XML valid according to
what> ::= <XML valid according to URI> | <XML valid according to identifier>
- NO NAMESPACE, which unspecifies the qualification but can access
something through a LOCATI ON
- ID which should allow access to a registered schema
So they amount to:
ACCORDING TO XMLSCHEMA URI <uri> [LOCATION <uri>]
ACCORDING TO XMLSCHEMA NO NAMESPACE [LOCATION <uri>]
ACCORDING TO XMLSCHEMA ID <registered_schema_name>
What I did is rely on the protection mechanisms that are already
implemented to just side-step the issue of arbitrary retrieval.
`PgXmlErrorContext * pg_xml_init(PgXmlStrictness strictness)` starts
the xml error context preventing any attempt by libxml2 to load an
external entity (DTD, XSD from URL, local file, etc.) returns an empty
string instead. Check around the line 1420 of xml.c:
```
/*
* Also, install an entity loader to prevent unwanted fetches of external
* files and URLs.
*/
errcxt->saved_entityfunc = xmlGetExternalEntityLoader();
xmlSetExternalEntityLoader(xmlPgEntityLoader);
```
So since I am relying on a TEXT for the schema, there should be no
issues of that sort. It does however cut part of the grammar that
handles locations, which is part of the standard, and would require
this feature to be much bigger in scope...
4) I suppose you refer to doc/src/sgml/func/func-xml.sgml. Will do
5) Hmm my intent was to simply handle TEXT on the xmlschema portion,
so the expr rule on that side is indeed an oversight. Now about the
first argument, that is just following the pattern already specified
in other xml functions, like XMLPARSE and XMLSERIALIZE, which have the
same <XML Value Expression> specified in the grammar. So that might
have alraedy diverged from the standard a while back...
This is currently grammatically valid, for example:
```
select xmlparse(DOCUMENT (select oid from pg_class limit 1));
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
2619
^
```
As a summary, it does not fully implement schemas as first-class
objects, as that would require extra parts of the grammar specified in
the standard, so I capitulated to use schemas as provided text. That's
already safe in my understanding given the shielding in place. If we
are to implement the rest, I think other serious concerns would arise,
like role management, how to store schemas, etc. And when it comes
to that, is it worth all the trouble just for xml? I would like this
feature and I think the solution of relying on text is decent, since
the cost of complying 100% seems very high for low returns.
Em dom., 7 de dez. de 2025 às 04:34, Kirill Reshke <reshkekirill@gmail.com>
escreveu:
Show quoted text
On Sun, 7 Dec 2025 at 04:38, Marcos Magueta <maguetamarcos@gmail.com>
wrote:Hello!
I am likely one of the few people still using XML, but I noticed XSD
schema validation is still a TODO on postgresql, which I have some personal
use cases for.In this patch I attempt to implement XMLVALIDATE with the already in use
libxml following a version I got my hands on of the SQL/XML standard of
2016.In short, I had to add the ACCORDING word to comply with it and
completely ignored the version that was already in the src that fetches
arbitrary schemas (it refers to validations of dtds, which is another more
troublesome TODO).I had problems running the regression tests on my machine, so I could
only test the feature by spawning a modified instance of postgresql and
issuing queries through psql, therefore I am marking it as WIP. If anyone
can assert the tests pass, I would be glad.Also, this is my first patch, so I might have not followed standard
practices as best as I could, so please pay particular attention to that on
review.Cheers,
Marcos Magueta.HI!
1)
+ // Default case since nothing got returned + // out of the normal path for validation calls to libxmlPostgreSQL uses /**/ comments style.
2)
XML regression test suite fails, see attached. By the way, what are
your issues with running `make check` ?3)
By the way, in [0] we have this`
The function in PostgreSQL produces an “unimplemented” error, because
PostgreSQL does not have any implementation of the mechanism assumed
in the standard for registering schemas in advance, which is necessary
to address the security implications of a function that could refer to
schemas from arbitrary locations.
`How does your patch resolve this? I did not find any change in this area
4)
Also I want to mention that without a doc, the patch is not in a
commitable shape.5) I am a bit surprised by this grammar rule:
XMLVALIDATE '(' document_or_content a_expr ACCORDING TO XMLSCHEMA
a_expr ')'
this allow a wide class of expressions accepted by parser, like
`SELECT xmlvalidate(DOCUMENT (select oid from pg_class) ACCORDING TO
XMLSCHEMA (select 32)) AS is_valid FROM xml_validation_test;`Is this expected? a_expr is way more than string constants and column
references.. If yes, the regression test that you added, does not
test this..p.s. I failed to find in google SQL/XML standard of 2016. So, I cannot
double-check if this feature is compliant with it...[0] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards
--
Best regards,
Kirill Reshke
Hello again!
I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:
Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server. While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs. However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're not going
to risk it in a security patch that necessarily hasn't gotten wide review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string. A future patch may improve this.
With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.
I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.
At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.
Attachments:
0001-full-xmlvalidate-text-schema-implementation.patchapplication/octet-stream; name=0001-full-xmlvalidate-text-schema-implementation.patchDownload+799-5
Hello again!
Is there any interest in this? I understand PostgreSQL has bigger fish to
fry, but I would like to at least know; in case this was just forgotten.
Regards!
Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <
maguetamarcos@gmail.com> escreveu:
Show quoted text
Hello again!
I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server. While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs. However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're notgoing
to risk it in a security patch that necessarily hasn't gotten wide
review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string. A future patch may improve this.With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.
On Thu, 1 Jan 2026, 01:27 Marcos Magueta, <maguetamarcos@gmail.com> wrote:
Hello again!
Is there any interest in this? I understand PostgreSQL has bigger fish to
fry, but I would like to at least know; in case this was just forgotten.Regards!
Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <
maguetamarcos@gmail.com> escreveu:Hello again!
I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server. While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs. However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're notgoing
to risk it in a security patch that necessarily hasn't gotten wide
review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string. A future patch may improve this.With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.
Hi!
First of all, please do not top post 🙏 . Use down-posting.
About general interest in feature - I suspect that we as a community
generally interested in implementing items from TODO list. This feature
also increases SQL standard compatibility. But I am myself not a big
SQL/XML user, so I can only give limited review here. I also did not have
much time last month. I will try to find my cycles to give another look
here.
Show quoted text
On 1 Jan 2026, at 05:25, Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 1 Jan 2026, 01:27 Marcos Magueta, <maguetamarcos@gmail.com <mailto:maguetamarcos@gmail.com>> wrote:
Hello again!
Is there any interest in this? I understand PostgreSQL has bigger fish to fry, but I would like to at least know; in case this was just forgotten.
Regards!
Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <maguetamarcos@gmail.com <mailto:maguetamarcos@gmail.com>> escreveu:
Hello again!
I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server. While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs. However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're not going
to risk it in a security patch that necessarily hasn't gotten wide review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string. A future patch may improve this.With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.Hi!
First of all, please do not top post 🙏 . Use down-posting.About general interest in feature - I suspect that we as a community generally interested in implementing items from TODO list. This feature also increases SQL standard compatibility. But I am myself not a big SQL/XML user, so I can only give limited review here. I also did not have much time last month. I will try to find my cycles to give another look here.
Thank you very much for reaching back. Sorry about the bad e-mail etiquette, hopefully it’s corrected now.
About the patch, let me know if you find the time to review!
Thanks once again!
On Fri, 2 Jan 2026 at 23:07, Marcos Magueta <maguetamarcos@gmail.com> wrote:
On 1 Jan 2026, at 05:25, Kirill Reshke <reshkekirill@gmail.com> wrote:
On Thu, 1 Jan 2026, 01:27 Marcos Magueta, <maguetamarcos@gmail.com> wrote:
Hello again!
Is there any interest in this? I understand PostgreSQL has bigger fish to fry, but I would like to at least know; in case this was just forgotten.
Regards!
Em sex., 19 de dez. de 2025 às 00:25, Marcos Magueta <maguetamarcos@gmail.com> escreveu:
Hello again!
I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server. While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs. However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're not going
to risk it in a security patch that necessarily hasn't gotten wide review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string. A future patch may improve this.With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.Hi!
First of all, please do not top post 🙏 . Use down-posting.About general interest in feature - I suspect that we as a community generally interested in implementing items from TODO list. This feature also increases SQL standard compatibility. But I am myself not a big SQL/XML user, so I can only give limited review here. I also did not have much time last month. I will try to find my cycles to give another look here.
Thank you very much for reaching back. Sorry about the bad e-mail etiquette, hopefully it’s corrected now.
About the patch, let me know if you find the time to review!
Thanks once again!
I registered this thread in the commitfest application[0]https://commitfest.postgresql.org/patch/6372/ to get more
attention from the community and also CF tests status.
It was already too late for january commitfest (PG-4), so this patch
is on PG19-Final.
As I understand you do not have any account, so please create one and
add yourself as a reviewer.
[0]: https://commitfest.postgresql.org/patch/6372/
--
Best regards,
Kirill Reshke
On 2 Jan 2026, at 23:07, Marcos Magueta <maguetamarcos@gmail.com> wrote:
About the patch, let me know if you find the time to review!
I was looking to review something on commitfest and decided to look into this patch.
Unfortunately, I cannot verify adherence to SQL standard. But I'll take it as granted, grammar changes are minimal.
I'm not a big XML user, but definitely there are a lot of use cases. E.g. If someone want to check whole database against new schema - this feature would be invaluable. I can think of many different use cases. But I heard some complaints about libxml. I'm not sure, but maybe at some point we would like to get rid of it? [0]/messages/by-id/aUK8aBluNzMZTatU@momjian.us
The patch fails regression tests on Windows. See [1]. Regression taken from [2]
Meaningfull part is:
diff --strip-trailing-cr -U3 C:/cirrus/src/test/regress/expected/xml_1.out C:/cirrus/build/testrun/regress/regress/results/xml.out
--- C:/cirrus/src/test/regress/expected/xml_1.out 2026-01-03 19:13:07.092850000 +0000
+++ C:/cirrus/build/testrun/regress/regress/results/xml.out 2026-01-03 19:17:23.497562500 +0000
@@ -1496,3 +1496,278 @@
LINE 1: SELECT xmltext('x'|| '<P>73</P>'::xml || .42 || true || 'j':...
^
DETAIL: This functionality requires the server to be built with libxml support.
+SELECT xmlvalidate(DOCUMENT '<person><name>John</name><age>30</age></person>'
+ ACCORDING TO XMLSCHEMA '<?xml version="1.0"?>
....
So you need to update src/test/regress/expected/xml_1.out for systems without libxml.
You use PG_TRY(); and return from that block. I found no other cases of returning without PG_END_TRY(), looks like it is not supported.
xmloption is document_or_content. But xmlvalidate_text_schema() always validates as a document. IDK, maybe it's correct, or maybe it works by accident.
+#else
+ NO_XML_SUPPORT();
+ return NULL;
+#endif
This NULL is returned from bool function xmlvalidate_text_schema(). I know it's unreachable, but let's return false or true.
Also, single-line comments worth converting to our usual C comments. The patch could benefit from pgindent.
That's all what I could find for now. Thanks for working on this!
Best regards, Andrey Borodin.
[0]: /messages/by-id/aUK8aBluNzMZTatU@momjian.us
[1]: https://api.cirrus-ci.com/v1/artifact/task/5580601438240768/testrun/build/testrun/regress/regress/regression.diffs
[2]: https://cirrus-ci.com/task/5580601438240768
Hi!
On 04/01/2026 10:46, Andrey Borodin wrote:
So you need to update src/test/regress/expected/xml_1.out for systems without libxml.
+1
There is also a xml_2.out. I don't really remember why it exists, but in
your case copying the changes from xml.out will most likely do the trick.
I am wondering whether passing the XML schema in every query is really
the right approach here. My main concern is that making the schema fully
user-controlled at execution time could lead to unwanted CPU or memory
usage -- not to mention the usability aspect of it. A catalog-level
schema registration mechanism would IMHO be a better fit, and it's also
supported by the SQL/XML standard:
"11.5 <XML valid according to clause>
Format
...
<XML valid according to identifier> ::= ID <registered XML schema name>
...
Syntax Rules
1) If <XML valid according to identifier> is specified, then the
<registered XML schema name> shall identify a registered XML schema RXS...
"
DB2 handles this similarly via REGISTER XMLSCHEMA.[1]https://www.ibm.com/docs/en/db2/11.5.x?topic=commands-register-xmlschema
Thanks for working on this!
Best, Jim
[1]: https://www.ibm.com/docs/en/db2/11.5.x?topic=commands-register-xmlschema
Thank you all for the careful review!
I'll go through the topics to fix the test and code changes today, but I
have a couple of questions about a catalog.
If we were to implement a catalog, I believe it would be either copying an
insert to a specified relation (created on demand) or to something in the
catalog, like pg_xmlschema. That could be a realistic change I could work
on. But what about the privilege level and file fetch support? I believe
it's not really an issue if the user is sufficiently privileged, so should
it mirror COPY FROM? I haven't seen its implementation, but I suppose it
already has security checks at the user privilege level. A valid
alternative to not deal with privileges and to leave the same restrictions
already in place to fetch arbitrary extensions to a specified schema; in
that way we are just moving the schema definition to another command before
being invoked and ignoring if it has any references outside of the plain
text specified (therefore, not using file://, like IBM, just text).
Surprisingly, the standard (I only have the 2016 here) leaves a great room
for freedom on how to implement the registration. It just specifies what it
should have:
An XML namespace NS contained in a registered XML Schema is
non-deterministic if NS contains a global
element declaration schema component that is non-deterministic.
A registered XML Schema is non-deterministic if it contains a
non-deterministic XML namespace.
A registered XML Schema is described by a registered XML Schema descriptor.
A registered XML Schema
descriptor includes:
— The target namespace URI of the registered XML Schema.
— The schema location URI of the registered XML Schema.
— The <registered XML Schema name> of the registered XML Schema.
— An indication of whether the registered XML Schema is permanently
registered.
— An indication of whether the registered XML Schema is non-deterministic.
— An unordered collection of the namespaces defined by the registered XML
Schema (the target namespace
is one of these namespaces).
— For each namespace defined by the registered XML Schema, an unordered
collection of the global element
declaration schema components in that namespace, with an indication for
each global element declaration
schema component whether that global element declaration schema component
is non-deterministic.
NOTE 9 — Without Feature X161, “Advanced Information Schema for registered
XML Schemas”, information whether an XML
Schema is deterministic, information about the collection of namespaces
defined in that XML Schema, and, for each such namespace
information about the global element declaration schema components in that
namespace, is not available in the XML_SCHEMAS,
XML_SCHEMA_NAMESPACES, and XML_SCHEMA_ELEMENTS views.
A registered XML Schema is identified by its <registered XML Schema name>.
I am tempted to go with a pg_xmlschema definition on the catalog and an
interface like the one IBM has, but still restricting file access. Dealing
with the security problems for that sounds excruciating. Any opinions?
Regards, Magueta.
Hi Marcos
On 05.01.26 18:49, Marcos Magueta wrote:
I am tempted to go with a pg_xmlschema definition on the catalog and an
interface like the one IBM has, but still restricting file access.
Dealing with the security problems for that sounds excruciating. Any
opinions?
Perhaps we need to first agree on some foundational aspects such as
design and syntax before going deeper into the code.
== return type ==
Your proposal returns a boolean, but the SQL/XML standard specifies
otherwise. In 6.21 <XML validate>:
"General Rule 8)
The result of <XML validate> is R."
where R is constructed as an XQuery sequence of nodes:
"General Rule 7)
Let R be an XQuery sequence enumerated by Rⱼ, 1 ≤ j ≤ N."
This may sound surprising at first glance, but it enables useful
patterns such as:
INSERT INTO t (c)
VALUES (XMLVALIDATE(x ACCORDING TO XMLSCHEMA s));
SELECT XMLSERIALIZE(
XMLVALIDATE(x ACCORDING TO XMLSCHEMA s)
AS text
);
In this model, validation failure is signaled via an error condition,
not by returning false.
== registered XML schemas ==
AFAICT the standard does not mandate any particular syntax for
registering XML schemas, so we are not required to implement REGISTER
XMLSCHEMA. Also, registered XML schemas must also be manageable objects,
which should be reflected in the proposed syntax. For example:
CREATE XMLSCHEMA foo AS '... XSD text ...';
CREATE XMLSCHEMA foo FROM file;
DROP XMLSCHEMA foo;
ALTER XMLSCHEMA foo RENAME TO bar;
ALTER XMLSCHEMA foo OWNER TO u;
ALTER XMLSCHEMA foo ADD '... new value ...';
and so on...
== permissions ==
Schema registration and usage should be privilege-controlled, for
example via dedicated roles:
GRANT pg_read_xmlschemas TO u;
GRANT pg_write_xmlschemas TO u;
...
After we have the XML schema management aspects figured out, we can move
on to XMLVALIDATE itself.
These are just my opinions. Let's also hear what the other reviewers
have to say before you start working on a v2.
Best, Jim
Hey Jim!
On 06.01.26, Jim Jones <jim.jones@uni-muenster.de> wrote:
The result of <XML validate> is R.
That was an oversight on my behalf, I had a hard time understanding the
standard, but now the validation of DOCUMENT and CONTENT being accepted
makes more sense.
The current patch has some issues.
xmloption is document_or_content. But xmlvalidate_text_schema() always
validates as a document.
As Andrey noticed, we should indeed support both a document and content.
Which entails into an iterative validation (for each node provided) on
content mode, so I should likely add the xmloption back. The fact it worked
with the example I created was actually luck.
Also, I am not sure if some variables used inside of the PG_TRY are memory
safe -- notice that none right now is set to volatile, despite being
accessed in different parts of the block; other functions in xml.c do
handle such correctly it seems (like xml_parse).
About the syntax proposal by Jim, I have no problems with complying to it.
It does increase considerably the scope from what I originally intended,
but that's the price to have something actually nice.
I can think of several useful extensions we could consider in a further
implementation:
Schema Dependencies/Imports
CREATE XMLSCHEMA base AS '...';
CREATE XMLSCHEMA extended
IMPORTS base
AS '...';
Schema Versioning
CREATE XMLSCHEMA patient VERSION '1.0' AS '...';
CREATE XMLSCHEMA patient VERSION '2.0' AS '...';
XMLVALIDATE(doc ACCORDING TO XMLSCHEMA patient VERSION '2.0')
Custom Error Messages
CREATE XMLSCHEMA patient
AS '...'
ERROR MESSAGE 'Patient record does not match schema v2.0';
Schema inference from samples (if the lib supports it, that is)
CREATE XMLSCHEMA patient
INFER FROM (SELECT data FROM patient_samples);
And much more, but perhaps that's already too ambitious for a first version.
I'll wait for the others to ring their bells.
Regards, Magueta.
Hello there, Postgres hackers!
Since we last talked I tried to actually implement a catalog for XML
schemas. It was a bit challenging personally.
I particularly had an issue with SYSCACHE so I can search the xmlschema
entries. I mostly tried to follow examples around the codebase (specially
when it comes to security, which I mostly mirrored from sequences), so if I
am misusing something, that's likely the problem, so please point that out.
Some highlights of the changes are:
What's implemented so far:
- New pg_xmlschema catalog (OIDs 6434 to 6438) for storing XML schema
definitions
- The validation function itself: XMLVALIDATE(DOCUMENT xml_expr ACCORDING
TO XMLSCHEMA xml_schema_name)
- DDL support: CREATE XMLSCHEMA, DROP XMLSCHEMA, ALTER XMLSCHEMA (with IF
NOT EXISTS, RENAME, SET SCHEMA, OWNER TO)
- Two syscaches (XMLSCHEMAOID and XMLSCHEMANAMENSP) for lookups during
parsing and execution
- Dependency tracking so schemas can't be dropped while views/rules
reference them (I think I even added a test for it, but it's worth checking
more closely)
- Integration with PostgreSQL's object system (event triggers, security
labels, permissions) -- I was mostly following what was in place already
and adding what seemed necessary
- Grants and role permissions
The flow is the following: schemas are registered via CREATE XMLSCHEMA,
then at parse time the schema name gets resolved to an OID (with
dependencies recorded), and finally at execution time the schema is fetched
by OID and used for validation. Please note that I only implemented
DOCUMENT mode for now, as it seemed like the most common use case and kept
things simpler. CONTENT mode by the standard seems to require another
keyword to make things useful (ELEMENT), which is a bit problematic since
it's a fairly used word to conflict on any database (it broke on the
current tests when I ran already, so...). With the ELEMENT we could do
things like: XMLVALIDATE(CONTENT '<author>C. J. Date</author>' ACCORDING TO
XMLSCHEMA book ELEMENT author), which is fairly powerful if you want to
validate just parts of the xml document, but again, it's scrapped from this
version because I wanted to pull my hair out dealing with it.
Following the SQL/XML standard, XMLVALIDATE returns the validated XML, not
a boolean anymore, and raises ERROR on validation failure. So now we can do
things like INSERT INTO r VALUES (XMLVALIDATE(...)) and other cool
compositions.
The documentation is still tracking the old xmlvalidate behavior and
nothing about the catalog changes.I can fix all that before submission if
the overall approach looks reasonable.
Permissions are a bit trickier, I thought of just relying on USAGE, since
the question for an XML validation is "Can I use this XML schema in a
call?". I think FDW works like that currently. I tried adding a bunch of
examples on xml.sql.
One change I specifically wanted feedback on are the changes on
parse_expr.c. Since I need to resolve the name of the schema, I thought
doing so at parsing time was the correct approach. In there I extract
schema name before the ResTarget loop runs, resolve it to OID using
get_xmlschema_oid(), save the string form in newx->name for deparsing (so
views can be displayed correctly) and finally clear x->named_args = NIL so
the ResTarget loop below has nothing to process. This approach reuses the
existing named_args field as temporary storage during parsing, then
converts it properly during transformation. It felt strange, but it avoids
adding another field to the XmlExpr structure just for XMLVALIDATE.
Also, I made the variables that get modified inside of the PG_TRY block
volatile as it seems to be a pattern likely to force the compiler to do
memory access instead of relying on registers during jumps.
I'm attaching the patch for another round of review (please excuse the lack
of the linter, I tried running but had some issues). Any feedback would be
really appreciated, this was quite an adventure.
PS: I noticed now that I haven't copied the changes over to xml1.out nor
xml2.out. But This is just for the feature validation round, so I can fix
that later.
Thanks for taking a look!
Attachments:
xmlschema_xmlvalidate.patchapplication/octet-stream; name=xmlschema_xmlvalidate.patchDownload+3482-1651
Hi
On 10/01/2026 06:26, Marcos Magueta wrote:
Since we last talked I tried to actually implement a catalog for XML
schemas.
The code is no longer compiling. The header pg_xmlschema_d.h wasn't
added to the patch (probably containing the definitions of
Anum_pg_xmlschema_oid, Anum_pg_xmlschema_schemaname, etc ..)
In file included from alter.c:44:
../../../src/include/catalog/pg_xmlschema.h:5:10: fatal error:
catalog/pg_xmlschema_d.h: No such file or directory
5 | #include "catalog/pg_xmlschema_d.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
You might also want to take a look at the cfbot errors and compilation
warnings[1,2]
Best, Jim
[1]: https://api.cirrus-ci.com/v1/artifact/task/6032809283813376/testrun/build/testrun/regress/regress/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/6032809283813376/testrun/build/testrun/regress/regress/regression.diffs
[2]: https://cirrus-ci.com/task/6173546772168704?logs=gcc_warning#L453
On 01.12.2026 12:54, Jim Jones <jim.jones@uni-muenster.de> wrote:
The code is no longer compiling. The header pg_xmlschema_d.h wasn't
added to the patch (probably containing the definitions of
Anum_pg_xmlschema_oid, Anum_pg_xmlschema_schemaname, etc ..)
I forgot to edit the makefiles so they generate those, I was building with
a powershell script. I could compile on Linux and all tests are now
passing, so I hope that was it.
Please follow the updated version attached.
Thanks for taking a look at this!
Marcos Magueta.
Attachments:
0002-xmlschema-catalog-and-xmlvalidate.patchapplication/octet-stream; name=0002-xmlschema-catalog-and-xmlvalidate.patchDownload+1902-9
On 14/01/2026 02:23, Marcos Magueta wrote:
Please follow the updated version attached.
A few comments:
== grammar ==
In gram.y you restricted CREATE XMLSCHEMA to Sconst:
DO $$
DECLARE xsd text := '<foo></foo>';
BEGIN
CREATE XMLSCHEMA person_schema AS xsd;
END $$;
ERROR: syntax error at or near "xsd"
LINE 4: CREATE XMLSCHEMA person_schema AS xsd;
Any particular reason for that? If not, take a look at other options,
e.g. a_expr
== pg_xmlschema ==
Why did you choose text over xml for schemadata?
postgres=# \d pg_xmlschema
Table "pg_catalog.pg_xmlschema"
Column | Type | Collation | Nullable | Default
-----------------+-----------+-----------+----------+---------
oid | oid | | not null |
schemaname | name | | not null |
schemanamespace | oid | | not null |
schemaowner | oid | | not null |
schemadata | text | C | not null |
schemaacl | aclitem[] | | |
Indexes:
"pg_xmlschema_oid_index" PRIMARY KEY, btree (oid)
"pg_xmlschema_name_nsp_index" UNIQUE CONSTRAINT, btree (schemaname,
schemanamespace)
== psql command to display and list xml schemas ==
Not a requirement for this patch (specially not at the current stage),
but you should add it to your TODO list.
\dz
\dz foo
\dz+ foo
* z here is just an example
== tab completion ==
CREATE <TAB> should suggest XMLSCHEMA and CREATE XML<TAB> should
autocomplete CREATE XMLSCHEMA. The same applies for DROP XMLSCHEMA [IF
EXISTS], where it should additionally list the available schemas after
DROP XMLSCHEMA <TAB>.
== white-space warnings ==
The patch does not apply cleanly:
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:594:
indent with spaces.
Oid schemanamespace,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:595:
indent with spaces.
Oid schemaowner,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:596:
indent with spaces.
const char *schemadata,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:597:
indent with spaces.
bool if_not_exists,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:598:
indent with spaces.
bool quiet)
warning: squelched 139 whitespace errors
warning: 144 lines add whitespace errors.
== file naming ==
Your patch suggests that it is part of a patch set, from which 0001 is
missing. In case you meant a version 2 of the previous patch, a better
format would be
v2-0001-xmlschema-catalog-and-xmlvalidate.patch
which can be generated with
$ git format-patch -1 -v2
== xml_1.out not updated ==
After every change in xml.sql you must create an equivalent file for a
postgres compiled without --with-libxml, and put the changes in
xml_1.out.[1]https://cirrus-ci.com/task/4872456290172928?logs=check_world#L126
== corrupt pg_dump ==
I understand we agreed to work on XMLVALIDATE only after CREATE
XMLSCHEMA is settled, but since the code is partially already there, you
might wanna take a look at pg_dump. It is not serialising the CREATE
XMLSCHEMA statements:
$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.
postgres=# CREATE XMLSCHEMA person_schema AS '<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>';
CREATE XMLSCHEMA
postgres=# CREATE VIEW v AS
SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml
ACCORDING TO XMLSCHEMA person_schema);
CREATE VIEW
postgres=# \q
$ /usr/local/postgres-dev/bin/pg_dump postgres
--
-- PostgreSQL database dump
--
\restrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd
-- Dumped from database version 19devel
-- Dumped by pg_dump version 19devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET transaction_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
--
-- Name: test_xmlschema_ns; Type: SCHEMA; Schema: -; Owner: jim
--
CREATE SCHEMA test_xmlschema_ns;
ALTER SCHEMA test_xmlschema_ns OWNER TO jim;
--
-- Name: v; Type: VIEW; Schema: public; Owner: jim
--
CREATE VIEW public.v AS
SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml ACCORDING TO XMLSCHEMA
person_schema) AS "xmlvalidate";
ALTER VIEW public.v OWNER TO jim;
--
-- PostgreSQL database dump complete
--
\unrestrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd
Take a look at pg_dump.c. You might need a new function, e.g.
dumpXmlSchemas(Archive *fout, const SchemaInfo *schemaInfo)
== patch structure ==
To make the review a bit easier, I suggest to split this patch into a
patch set with **at least 4** smaller patches - the more seasoned
hackers here might correct me if I am wrong. For instance:
0001 - CREATE XMLSCHEMA (code + tests + documentation)
0002 - pg_dump changes to output CREATE XMLSCHEMA
0003 - psql tab completion + new command to display and list xml schemas
0004 - XMLVALIDATE (code + tests + documentation)
Thanks for working on this!
Best, Jim
[1]: https://cirrus-ci.com/task/4872456290172928?logs=check_world#L126
Hey Jim,
Any particular reason for that? If not, take a look at other options,
e.g. a_expr
No particular reason apart from it being simpler since I didn't need to
invoke an execution at the cmd. Changed it now.
Why did you choose text over xml for schemadata?
My original thought was that XML schemas require additional validation in
contrast to normal XML, but it being additive, we would have redundant
checks. But in reconsideration, perhaps keeping the field with an XML type
is more intuitive for anyone introspecting over the catalog. Also applied
the change on the latest version of the patch.
Just a question about something I am very curious about the previous patch
I sent:
I noticed DefineXmlSchema() calls IsThereXmlSchemaInNamespace() right after
XmlSchemaCreate() returns a valid OID. Since XmlSchemaCreate() already
inserted the tuple into the catalog (via CatalogTupleInsert at
pg_xmlschema.c:166), wouldn't SearchSysCacheExists2() find it and always
throw "already exists"? We all tested the original code and it worked fine,
so I'm missing something about syscache visibility or timing; that was an
early function I did to check for duplicates that ended up in the wrong
place. I removed the call (and function) as I judged it to be redundant
(the duplicate check already happens inside XmlSchemaCreate()), but is
there something subtle about intra-command visibility I'm not
understanding? If anyone knows, please let me know.
I tried to split the patch into multiple, as recommended, but there might
still exist some overlaps when it comes to the division of the
implementation of XMLVALIDATE and XMLSCHEMA. I put the docs and tests on
the last patch, as I had issues amending.
Also, I added tab completion on psql and fixed pg_dump.
Regards, Marcos Magueta
Em qua., 14 de jan. de 2026 às 07:10, Jim Jones <jim.jones@uni-muenster.de>
escreveu:
Show quoted text
On 14/01/2026 02:23, Marcos Magueta wrote:
Please follow the updated version attached.
A few comments:
== grammar ==
In gram.y you restricted CREATE XMLSCHEMA to Sconst:
DO $$
DECLARE xsd text := '<foo></foo>';
BEGIN
CREATE XMLSCHEMA person_schema AS xsd;
END $$;
ERROR: syntax error at or near "xsd"
LINE 4: CREATE XMLSCHEMA person_schema AS xsd;Any particular reason for that? If not, take a look at other options,
e.g. a_expr== pg_xmlschema ==
Why did you choose text over xml for schemadata?
postgres=# \d pg_xmlschema
Table "pg_catalog.pg_xmlschema"
Column | Type | Collation | Nullable | Default
-----------------+-----------+-----------+----------+---------
oid | oid | | not null |
schemaname | name | | not null |
schemanamespace | oid | | not null |
schemaowner | oid | | not null |
schemadata | text | C | not null |
schemaacl | aclitem[] | | |
Indexes:
"pg_xmlschema_oid_index" PRIMARY KEY, btree (oid)
"pg_xmlschema_name_nsp_index" UNIQUE CONSTRAINT, btree (schemaname,
schemanamespace)== psql command to display and list xml schemas ==
Not a requirement for this patch (specially not at the current stage),
but you should add it to your TODO list.\dz
\dz foo
\dz+ foo* z here is just an example
== tab completion ==
CREATE <TAB> should suggest XMLSCHEMA and CREATE XML<TAB> should
autocomplete CREATE XMLSCHEMA. The same applies for DROP XMLSCHEMA [IF
EXISTS], where it should additionally list the available schemas after
DROP XMLSCHEMA <TAB>.== white-space warnings ==
The patch does not apply cleanly:
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:594:
indent with spaces.
Oid schemanamespace,/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:595:
indent with spaces.
Oid schemaowner,/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:596:
indent with spaces.
const char *schemadata,/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:597:
indent with spaces.
bool if_not_exists,/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:598:
indent with spaces.
bool quiet)
warning: squelched 139 whitespace errors
warning: 144 lines add whitespace errors.== file naming ==
Your patch suggests that it is part of a patch set, from which 0001 is
missing. In case you meant a version 2 of the previous patch, a better
format would bev2-0001-xmlschema-catalog-and-xmlvalidate.patch
which can be generated with
$ git format-patch -1 -v2
== xml_1.out not updated ==
After every change in xml.sql you must create an equivalent file for a
postgres compiled without --with-libxml, and put the changes in
xml_1.out.[1]== corrupt pg_dump ==
I understand we agreed to work on XMLVALIDATE only after CREATE
XMLSCHEMA is settled, but since the code is partially already there, you
might wanna take a look at pg_dump. It is not serialising the CREATE
XMLSCHEMA statements:$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.postgres=# CREATE XMLSCHEMA person_schema AS '<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>';
CREATE XMLSCHEMA
postgres=# CREATE VIEW v AS
SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml
ACCORDING TO XMLSCHEMA person_schema);
CREATE VIEW
postgres=# \q$ /usr/local/postgres-dev/bin/pg_dump postgres
--
-- PostgreSQL database dump
--\restrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd
-- Dumped from database version 19devel
-- Dumped by pg_dump version 19develSET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET transaction_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;--
-- Name: test_xmlschema_ns; Type: SCHEMA; Schema: -; Owner: jim
--CREATE SCHEMA test_xmlschema_ns;
ALTER SCHEMA test_xmlschema_ns OWNER TO jim;
--
-- Name: v; Type: VIEW; Schema: public; Owner: jim
--CREATE VIEW public.v AS
SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml ACCORDING TO XMLSCHEMA
person_schema) AS "xmlvalidate";ALTER VIEW public.v OWNER TO jim;
--
-- PostgreSQL database dump complete
--\unrestrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd
Take a look at pg_dump.c. You might need a new function, e.g.
dumpXmlSchemas(Archive *fout, const SchemaInfo *schemaInfo)== patch structure ==
To make the review a bit easier, I suggest to split this patch into a
patch set with **at least 4** smaller patches - the more seasoned
hackers here might correct me if I am wrong. For instance:0001 - CREATE XMLSCHEMA (code + tests + documentation)
0002 - pg_dump changes to output CREATE XMLSCHEMA
0003 - psql tab completion + new command to display and list xml schemas
0004 - XMLVALIDATE (code + tests + documentation)Thanks for working on this!
Best, Jim
[1] https://cirrus-ci.com/task/4872456290172928?logs=check_world#L126
Attachments:
0004-Add-XMLVALIDATE-function-for-XML-schema-validation.patchapplication/octet-stream; name=0004-Add-XMLVALIDATE-function-for-XML-schema-validation.patchDownload+217-3
0003-Add-psql-tab-completion-for-XMLSCHEMA-commands.patchapplication/octet-stream; name=0003-Add-psql-tab-completion-for-XMLSCHEMA-commands.patchDownload+22-1
0002-Add-pg_dump-support-for-XMLSCHEMA-objects.patchapplication/octet-stream; name=0002-Add-pg_dump-support-for-XMLSCHEMA-objects.patchDownload+140-1
0001-Add-CREATE-ALTER-DROP-XMLSCHEMA-DDL-commands.patchapplication/octet-stream; name=0001-Add-CREATE-ALTER-DROP-XMLSCHEMA-DDL-commands.patchDownload+734-7
0005-Add-tests-and-documentation-for-XMLSCHEMA-feature.patchapplication/octet-stream; name=0005-Add-tests-and-documentation-for-XMLSCHEMA-feature.patchDownload+3336-1647
On 21/01/2026 21:44, Marcos Magueta wrote:
Any particular reason for that? If not, take a look at other options,
e.g. a_expr
No particular reason apart from it being simpler since I didn't need to
invoke an execution at the cmd. Changed it now.Why did you choose text over xml for schemadata?
My original thought was that XML schemas require additional validation
in contrast to normal XML, but it being additive, we would have
redundant checks. But in reconsideration, perhaps keeping the field with
an XML type is more intuitive for anyone introspecting over the catalog.
Also applied the change on the latest version of the patch.
Data type for schemadata in pg_xmlschema is now xml.
postgres=# \d pg_xmlschema
Table "pg_catalog.pg_xmlschema"
Column | Type | Collation | Nullable | Default
-----------------+-----------+-----------+----------+---------
oid | oid | | not null |
schemaname | name | | not null |
schemanamespace | oid | | not null |
schemaowner | oid | | not null |
schemadata | xml | | not null |
schemaacl | aclitem[] | | |
Indexes:
"pg_xmlschema_oid_index" PRIMARY KEY, btree (oid)
"pg_xmlschema_name_nsp_index" UNIQUE CONSTRAINT, btree (schemaname,
schemanamespace)
I agree it's more intuitive this way. It also facilitates function calls
that require the parameter to be xml, e.g. xmlserialize
postgres=# CREATE XMLSCHEMA x AS
'<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element
name="duplicate" type="xs:string"/></xs:schema>';
CREATE XMLSCHEMA
postgres=# SELECT xmlserialize(DOCUMENT schemadata AS text INDENT) FROM
pg_xmlschema;
xmlserialize
---------------------------------------------------------
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">+
<xs:element name="duplicate" type="xs:string"/> +
</xs:schema>
(1 row)
I noticed DefineXmlSchema() calls IsThereXmlSchemaInNamespace() right
after XmlSchemaCreate() returns a valid OID. Since XmlSchemaCreate()
already inserted the tuple into the catalog (via CatalogTupleInsert at
pg_xmlschema.c:166), wouldn't SearchSysCacheExists2() find it and always
throw "already exists"? We all tested the original code and it worked
fine, so I'm missing something about syscache visibility or timing; that
was an early function I did to check for duplicates that ended up in the
wrong place. I removed the call (and function) as I judged it to be
redundant (the duplicate check already happens inside
XmlSchemaCreate()), but is there something subtle about intra-command
visibility I'm not understanding? If anyone knows, please let me know.
I couldn't find any IsThereXmlSchemaInNamespace call in DefineXmlSchema
in the current version, so I cannot say much here. But I agree that the
a further check is not necessary, since XmlSchemaCreate is already doing it.
Also, I added tab completion on psql and fixed pg_dump.
Nice. pg_dump now exports CREATE XMLSCHEMA statements.
Tab completion for CREATE, ALTER, and DROP XMLSCHEMA now also works.
A few other comments
== patch version ==
You forgot to include the version to the patch name.
For instance, instead of
0001-Add-CREATE-ALTER-DROP-XMLSCHEMA-DDL-commands.patch the file could
be named v3-0001-Add-CREATE-ALTER-DROP-XMLSCHEMA-DDL-commands.patch
== IS_XMLVALIDATE dependency ==
The patches 0001, 0002, and 0003 depend on IS_XMLVALIDATE, which is only
introduced in 0004, so they cannot be compiled and tested independently.
== permissions ==
In the tests I see you added a few GRANTs to set the visibility of
certain xmlschemas:
GRANT USAGE ON XMLSCHEMA permission_test_schema TO regress_xmlschema_user2
I could not find anything regarding this in the docs. If we are to
support it, shouldn't we add it to grant.sgml?
As I mentioned upthread, I believe that schema registration and usage
should be privilege-controlled, for example via dedicated roles
GRANT pg_read_xmlschemas TO u;
GRANT pg_write_xmlschemas TO u;
What do you think?
But being able to grant or revoke access to a certain xmlschema also has
its appeal :)
Best, Jim
On 23/01/2026 13:19, Jim Jones wrote:
On 21/01/2026 21:44, Marcos Magueta wrote:
Any particular reason for that? If not, take a look at other options,
e.g. a_expr
No particular reason apart from it being simpler since I didn't need to
invoke an execution at the cmd. Changed it now.
On second thought, is there any scenario where we'll need a_expr for "y"
at all in "CREATE XMLSCHEMA x AS y"? Isn't it always going to be a
string? I see now that my example in the previous post was somewhat
misleading (sorry for the noise).
Best, Jim