[PATCH] contrib/xml2: xslt_process() should report XSLT-related error details
Dear PG hackers,
here is a patch that improves the xslt_process() error handling. Error
handling was one of the points criticized earlier [1]/messages/by-id/4C5ECEF9.3030806@mlfowler.com and as far as I
understand it is one of the reasons that xslt_process() was never adopted
into core and contrib/xml2 is considered deprecated since PG 8. However the
point is also still on the TODO list ("Report errors returned by the XSLT
library") and xslt_process() has been asked for several times on the mailing
list. Memory handling bugs that have been reported earlier [2]/messages/by-id/4A6A276A.6090405@dunslane.net have obviously
long been fixed.
The point raised in [1]/messages/by-id/4C5ECEF9.3030806@mlfowler.com was that xslt_process() would not report missing
stylesheet parameters at all. This hasn't been the case at the current HEAD.
However, xslt_process() still wouldn't report any details about errors
related to XSLT processing. This is what is fixed by this patch. It now
installs an XSLT error handler. See the patch comment for details. Since the
goal is to eventually get xslt_process() adopted into core, I also got rid of
PG_XML_STRICTNESS_LEGACY.
Perhaps you can tell me what else is preventing adoption into core. I believe
that xslt_process() should also accept the `xml` type as an alternative to
strings. Strings should be kept for backwards compatibility, though. Also,
the stylesheet parameter passing is very crude and limited. I suggest, that
the current method of passing them as strings (see parse_params()) is
deprecated and we don't try to tackle its limitations, but will instead
accept hstores. If you agree, I will work on these two tasks next.
Yours sincerely,
Robin Haberkorn
[1]: /messages/by-id/4C5ECEF9.3030806@mlfowler.com
[2]: /messages/by-id/4A6A276A.6090405@dunslane.net
--
Robin Haberkorn
Senior Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
I forgot the patch of course...
--
Robin Haberkorn
Senior Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
Attachments:
0001-contrib-xml2-xslt_process-now-reports-XSLT-related-e.patchtext/x-patch; charset=utf-8; name=0001-contrib-xml2-xslt_process-now-reports-XSLT-related-e.patchDownload
From cc5be302aa4305403b8131f6c22c39e4dfb75bd1 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Thu, 17 Apr 2025 13:15:48 +0300
Subject: [PATCH] contrib/xml2: xslt_process() now reports XSLT-related error
details
* It sets (and restores) the libxslt error handler (xsltSetGenericErrorFunc()).
Since it only supports old-school "generic" error handlers, which are no longer
used in PG's libxml code, we reintroduced a "generic" error handler
xml_generic_error_handler() in xml.c.
* The alternative would have been to expose PgXmlErrorContext in xml.h,
so we could implement a generic handler in xslt_proc.c.
This is obviously not desirable, as it would depend on USE_LIBXML.
* No longer use the PG_XML_STRICTNESS_LEGACY for error handling,
but query the error_occurred-flag via pg_xml_error_occurred().
The existance of this getter is another hint, that we are not supposed
to expose PgXmlErrorContext in xml.h.
* This change means that xslt_process() now reports not only details
about XML parsing errors, but XSLT-schema deviations and missing
stylesheet parameters as well.
* The XSLT error messages now also contain line numbers.
For that to work, we had to set a dummy "SQL" URL when parsing XML strings.
This is important, since xsltPrintErrorContext() includes
line numbers only if an URL is set.
* The xsltSaveResultToString() error handling has been removed.
It can practically only fail in OOM situations and there is no reason
to handle them any different than with the other libxslt functions.
* Updated test suite and added test case for detecting missing
stylesheet parameters.
This was initially reported here but has obviously been fixed in the
meantime:
https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com
---
contrib/xml2/expected/xml2.out | 17 +++++++++++++++
contrib/xml2/sql/xml2.sql | 8 +++++++
contrib/xml2/xslt_proc.c | 40 ++++++++++++++++++++++++----------
src/backend/utils/adt/xml.c | 37 +++++++++++++++++++++++++++++++
src/include/utils/xml.h | 2 ++
5 files changed, 92 insertions(+), 12 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 3d97b14..157d584 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -261,3 +261,20 @@ $$<xsl:stylesheet version="1.0"
</xsl:template>
</xsl:stylesheet>$$);
ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 7 element output
+File write for 0wn3d.txt refused
+runtime error: file SQL line 7 element output
+xsltDocumentElem: write rights for 0wn3d.txt denied
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
+ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 3 element value-of
+Variable 'n1' has not been declared.
+Undefined variable
+runtime error: file SQL line 3 element value-of
+XPath evaluation returned no result.
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index ef99d16..9d42ac8 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -153,3 +153,11 @@ $$<xsl:stylesheet version="1.0"
</sax:output>
</xsl:template>
</xsl:stylesheet>$$);
+
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index b720d89..f17cf9f 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -61,6 +61,10 @@ xslt_process(PG_FUNCTION_ARGS)
xmlChar *resstr = NULL;
int reslen = 0;
+ /* the previous libxslt error context */
+ xmlGenericErrorFunc saved_errfunc;
+ void *saved_errcxt;
+
if (fcinfo->nargs == 3)
{
paramstr = PG_GETARG_TEXT_PP(2);
@@ -74,35 +78,46 @@ xslt_process(PG_FUNCTION_ARGS)
}
/* Setup parser */
- xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
+ xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
+
+ /*
+ * Save the previous libxslt error context.
+ */
+ saved_errfunc = xsltGenericError;
+ saved_errcxt = xsltGenericErrorContext;
+ xsltSetGenericErrorFunc(xmlerrcxt, xml_generic_error_handler);
PG_TRY();
{
xmlDocPtr ssdoc;
bool xslt_sec_prefs_error;
- /* Parse document */
+ /*
+ * Parse document.
+ * It's important to set an "URL", so libxslt includes
+ * line numbers in error messages (cf. xsltPrintErrorContext()).
+ */
doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), NULL, NULL,
+ VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
XML_PARSE_NOENT);
- if (doctree == NULL)
+ if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing XML document");
/* Same for stylesheet */
ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), NULL, NULL,
+ VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
XML_PARSE_NOENT);
- if (ssdoc == NULL)
+ if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing stylesheet as XML document");
/* After this call we need not free ssdoc separately */
stylesheet = xsltParseStylesheetDoc(ssdoc);
- if (stylesheet == NULL)
+ if (stylesheet == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
"failed to parse stylesheet");
@@ -137,11 +152,14 @@ xslt_process(PG_FUNCTION_ARGS)
restree = xsltApplyStylesheetUser(stylesheet, doctree, params,
NULL, NULL, xslt_ctxt);
- if (restree == NULL)
+ if (restree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
"failed to apply stylesheet");
resstat = xsltSaveResultToString(&resstr, &reslen, restree, stylesheet);
+ if (resstat < 0 || pg_xml_error_occurred(xmlerrcxt))
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
+ "failed to save result to string");
}
PG_CATCH();
{
@@ -157,6 +175,7 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFreeDoc(doctree);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, true);
PG_RE_THROW();
@@ -170,12 +189,9 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFreeDoc(doctree);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, false);
- /* XXX this is pretty dubious, really ought to throw error instead */
- if (resstat < 0)
- PG_RETURN_NULL();
-
result = cstring_to_text_with_len((char *) resstr, reslen);
if (resstr)
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index db8d0d6..74f78c5 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -2078,6 +2078,43 @@ xml_errsave(Node *escontext, PgXmlErrorContext *errcxt,
detail ? errdetail_internal("%s", detail) : 0));
}
+/*
+ * Generic error handler for libxml errors and warnings.
+ * This is not used by this module, but may be useful for
+ * libxml-based libraries like libxslt, which do not support
+ * structured error handlers.
+ */
+void
+xml_generic_error_handler(void *data, const char * msg, ...)
+{
+ PgXmlErrorContext *xmlerrcxt = (PgXmlErrorContext *) data;
+ StringInfo errorBuf;
+ va_list ap;
+
+ /*
+ * Defend against someone passing us a bogus context struct.
+ *
+ * We force a backend exit if this check fails because longjmp'ing out of
+ * libxslt would likely render it unsafe to use further.
+ */
+ if (xmlerrcxt->magic != ERRCXT_MAGIC)
+ elog(FATAL, "xml_generic_error_handler called with invalid PgXmlErrorContext");
+
+ /* Prepare error message in errorBuf */
+ errorBuf = makeStringInfo();
+ va_start(ap, msg);
+ appendStringInfoVA(errorBuf, msg, ap);
+ va_end(ap);
+
+ /* Get rid of any trailing newlines in errorBuf */
+ chopStringInfoNewlines(errorBuf);
+
+ appendStringInfoLineSeparator(&xmlerrcxt->err_buf);
+ appendBinaryStringInfo(&xmlerrcxt->err_buf, errorBuf->data,
+ errorBuf->len);
+
+ destroyStringInfo(errorBuf);
+}
/*
* Error handler for libxml errors and warnings
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 0d7a816..a754d8f 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -66,6 +66,8 @@ extern void pg_xml_init_library(void);
extern PgXmlErrorContext *pg_xml_init(PgXmlStrictness strictness);
extern void pg_xml_done(PgXmlErrorContext *errcxt, bool isError);
extern bool pg_xml_error_occurred(PgXmlErrorContext *errcxt);
+extern void xml_generic_error_handler(void *data, const char * msg, ...)
+ pg_attribute_printf(2, 3);
extern void xml_ereport(PgXmlErrorContext *errcxt, int level, int sqlcode,
const char *msg);
--
2.48.1
"Robin Haberkorn" <haberkorn@b1-systems.de> writes:
I forgot the patch of course...
Hi Robin, thank you for the patch! This has arrived too late to
be considered for PG v18, but please add the thread to the open
commitfest
https://commitfest.postgresql.org/53/
so that we don't forget about it for v19, and so that it will
receive automated testing from our cfbot.
Just from a quick eyeballing of the patch, it seems generally
sane, but I wonder why you have xml_generic_error_handler
creating and then destroying an errorBuf instead of just
appending the results directly to xmlerrcxt->err_buf. The
extra buffer doesn't seem to be doing much except increasing
our odds of an OOM failure right there.
I'm also a little suspicious of
- xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
+ xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
as that looks like it is probably changing the behavior of
the module in ways bigger than just providing more error
details. This code has been in legacy status for so long
that I feel a little hesitant about doing that. If we are
willing to do it, should we go further and clean out the use
of PG_XML_STRICTNESS_LEGACY in xml2/xpath.c as well?
regards, tom lane
Hello Tom!
On Tue Apr 22, 2025 at 18:29:02 GMT +03, Tom Lane wrote:
Hi Robin, thank you for the patch! This has arrived too late to
be considered for PG v18, but please add the thread to the open
commitfest
I will do that. But what about the other changes I proposed (xml data type
and hstore support)? Should they all be discussed in this thread or should I
create separate threads and consequently separate patch entries for the
Commitfest? And should they all be based on the master branch or can they be
based upon each other? Do you think they will help to bring xslt_process()
closer to adoption into core?
Just from a quick eyeballing of the patch, it seems generally
sane, but I wonder why you have xml_generic_error_handler
creating and then destroying an errorBuf instead of just
appending the results directly to xmlerrcxt->err_buf. The
extra buffer doesn't seem to be doing much except increasing
our odds of an OOM failure right there.
You are right. It was a leftover from thinning out xml_errorHandler().
I attached a new version of the patch where this is fixed.
I also ran all touched files through pgindent.
I'm also a little suspicious of
- xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY); + xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);as that looks like it is probably changing the behavior of
the module in ways bigger than just providing more error
details. This code has been in legacy status for so long
that I feel a little hesitant about doing that.
You wouldn't actually need that change to get more error details as the
details are provided by xml_generic_error_handler() which ignores the
strictness level. The reason is that libxml generic error handlers cannot
discern between warnings and errors. The only thing that the strictness level
does is affecting how xml_errorHandler() works: With STRICTNESS_ALL, it can
actually log warnings without appending anything to the err_buf and set the
err_occurred flag which is apparently useful because some libxml functions
can result in error-level messages without the function calls signalling
failure. At least, otherwise the code doesn't make sense to me. In other
words, using the STRICTNESS_ALL should allow you to see warnings, you
wouldn't otherwise see and catch errors, you wouldn't otherwise catch -- at
least if they occur in libxml itself. As far as I understand from the code
comment xml_errorHandler(), STRICTNESS_LEGACY was introduced, just so you
don't have to touch the existing contrib/xml2 code that did not check
err_occurred:
/*
* Legacy error handling mode. err_occurred is never set, we just add the
* message to err_buf. This mode exists because the xml2 contrib module
* uses our error-handling infrastructure, but we don't want to change its
* behaviour since it's deprecated anyway. This is also why we don't
* distinguish between notices, warnings and errors here --- the old-style
* generic error handler wouldn't have done that either.
*/
Since we now check that flag in xslt_proc.c, we should be fine.
But I might be overlooking something.
If we are willing to do it, should we go further and clean out the use
of PG_XML_STRICTNESS_LEGACY in xml2/xpath.c as well?
I think this is a separate construction site. We don't necessarily have
to touch xpath.c and I didn't plan to do so in the near future.
We should first determine whether the XPath support is actually worth
keeping (should be adopted into core) or rather stay deprecated.
Best regards,
Robin Haberkorn
--
Robin Haberkorn
Senior Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
Attachments:
v2-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchtext/x-patch; charset=utf-8; name=v2-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchDownload
From b4d9a6594142cf85e4e38352f6be3cea260962f7 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Thu, 17 Apr 2025 13:15:48 +0300
Subject: [PATCH v2] contrib/xml2: xslt_process() now reports XSLT-related
error details
* It sets (and restores) the libxslt error handler (xsltSetGenericErrorFunc()).
Since it only supports old-school "generic" error handlers, which are no longer
used in PG's libxml code, we reintroduced a "generic" error handler
xml_generic_error_handler() in xml.c.
* The alternative would have been to expose PgXmlErrorContext in xml.h,
so we could implement a generic handler in xslt_proc.c.
This is obviously not desirable, as it would depend on USE_LIBXML.
* No longer use the PG_XML_STRICTNESS_LEGACY for error handling,
but query the error_occurred-flag via pg_xml_error_occurred().
The existance of this getter is another hint, that we are not supposed
to expose PgXmlErrorContext in xml.h.
* This change means that xslt_process() now reports not only details
about XML parsing errors, but XSLT-schema deviations and missing
stylesheet parameters as well.
* The XSLT error messages now also contain line numbers.
For that to work, we had to set a dummy "SQL" URL when parsing XML strings.
This is important, since xsltPrintErrorContext() includes
line numbers only if an URL is set.
* The xsltSaveResultToString() error handling has been removed.
It can practically only fail in OOM situations and there is no reason
to handle them any different than with the other libxslt functions.
* Updated test suite and added test case for detecting missing
stylesheet parameters.
This was initially reported here but has obviously been fixed in the
meantime:
https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com
---
contrib/xml2/expected/xml2.out | 17 +++++++++++++++
contrib/xml2/sql/xml2.sql | 8 +++++++
contrib/xml2/xslt_proc.c | 39 +++++++++++++++++++++++-----------
src/backend/utils/adt/xml.c | 29 +++++++++++++++++++++++++
src/include/utils/xml.h | 2 ++
5 files changed, 83 insertions(+), 12 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 3d97b14..157d584 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -261,3 +261,20 @@ $$<xsl:stylesheet version="1.0"
</xsl:template>
</xsl:stylesheet>$$);
ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 7 element output
+File write for 0wn3d.txt refused
+runtime error: file SQL line 7 element output
+xsltDocumentElem: write rights for 0wn3d.txt denied
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
+ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 3 element value-of
+Variable 'n1' has not been declared.
+Undefined variable
+runtime error: file SQL line 3 element value-of
+XPath evaluation returned no result.
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index ef99d16..9d42ac8 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -153,3 +153,11 @@ $$<xsl:stylesheet version="1.0"
</sax:output>
</xsl:template>
</xsl:stylesheet>$$);
+
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index b720d89..2aa44d3 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -61,6 +61,10 @@ xslt_process(PG_FUNCTION_ARGS)
xmlChar *resstr = NULL;
int reslen = 0;
+ /* the previous libxslt error context */
+ xmlGenericErrorFunc saved_errfunc;
+ void *saved_errcxt;
+
if (fcinfo->nargs == 3)
{
paramstr = PG_GETARG_TEXT_PP(2);
@@ -74,35 +78,45 @@ xslt_process(PG_FUNCTION_ARGS)
}
/* Setup parser */
- xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
+ xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
+
+ /*
+ * Save the previous libxslt error context.
+ */
+ saved_errfunc = xsltGenericError;
+ saved_errcxt = xsltGenericErrorContext;
+ xsltSetGenericErrorFunc(xmlerrcxt, xml_generic_error_handler);
PG_TRY();
{
xmlDocPtr ssdoc;
bool xslt_sec_prefs_error;
- /* Parse document */
+ /*
+ * Parse document. It's important to set an "URL", so libxslt includes
+ * line numbers in error messages (cf. xsltPrintErrorContext()).
+ */
doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), NULL, NULL,
+ VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
XML_PARSE_NOENT);
- if (doctree == NULL)
+ if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing XML document");
/* Same for stylesheet */
ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), NULL, NULL,
+ VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
XML_PARSE_NOENT);
- if (ssdoc == NULL)
+ if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing stylesheet as XML document");
/* After this call we need not free ssdoc separately */
stylesheet = xsltParseStylesheetDoc(ssdoc);
- if (stylesheet == NULL)
+ if (stylesheet == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
"failed to parse stylesheet");
@@ -137,11 +151,14 @@ xslt_process(PG_FUNCTION_ARGS)
restree = xsltApplyStylesheetUser(stylesheet, doctree, params,
NULL, NULL, xslt_ctxt);
- if (restree == NULL)
+ if (restree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
"failed to apply stylesheet");
resstat = xsltSaveResultToString(&resstr, &reslen, restree, stylesheet);
+ if (resstat < 0 || pg_xml_error_occurred(xmlerrcxt))
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
+ "failed to save result to string");
}
PG_CATCH();
{
@@ -157,6 +174,7 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFreeDoc(doctree);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, true);
PG_RE_THROW();
@@ -170,12 +188,9 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFreeDoc(doctree);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, false);
- /* XXX this is pretty dubious, really ought to throw error instead */
- if (resstat < 0)
- PG_RETURN_NULL();
-
result = cstring_to_text_with_len((char *) resstr, reslen);
if (resstr)
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index db8d0d6..61b0e2e 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -2078,6 +2078,35 @@ xml_errsave(Node *escontext, PgXmlErrorContext *errcxt,
detail ? errdetail_internal("%s", detail) : 0));
}
+/*
+ * Generic error handler for libxml errors and warnings.
+ * This is not used by this module, but may be useful for
+ * libxml-based libraries like libxslt, which do not support
+ * structured error handlers.
+ */
+void
+xml_generic_error_handler(void *data, const char *msg,...)
+{
+ PgXmlErrorContext *xmlerrcxt = (PgXmlErrorContext *) data;
+ va_list ap;
+
+ /*
+ * Defend against someone passing us a bogus context struct.
+ *
+ * We force a backend exit if this check fails because longjmp'ing out of
+ * libxslt would likely render it unsafe to use further.
+ */
+ if (xmlerrcxt->magic != ERRCXT_MAGIC)
+ elog(FATAL, "xml_generic_error_handler called with invalid PgXmlErrorContext");
+
+ appendStringInfoLineSeparator(&xmlerrcxt->err_buf);
+ va_start(ap, msg);
+ appendStringInfoVA(&xmlerrcxt->err_buf, msg, ap);
+ va_end(ap);
+
+ /* Get rid of any trailing newlines in errorBuf */
+ chopStringInfoNewlines(&xmlerrcxt->err_buf);
+}
/*
* Error handler for libxml errors and warnings
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 0d7a816..7cb101e 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -66,6 +66,8 @@ extern void pg_xml_init_library(void);
extern PgXmlErrorContext *pg_xml_init(PgXmlStrictness strictness);
extern void pg_xml_done(PgXmlErrorContext *errcxt, bool isError);
extern bool pg_xml_error_occurred(PgXmlErrorContext *errcxt);
+extern void xml_generic_error_handler(void *data, const char *msg,...)
+ pg_attribute_printf(2, 3);
extern void xml_ereport(PgXmlErrorContext *errcxt, int level, int sqlcode,
const char *msg);
--
2.48.1
On Wed Apr 23, 2025 at 00:42:06 GMT +03, Robin Haberkorn wrote:
/*
* Legacy error handling mode. err_occurred is never set, we just add the
* message to err_buf. This mode exists because the xml2 contrib module
* uses our error-handling infrastructure, but we don't want to change its
* behaviour since it's deprecated anyway. This is also why we don't
* distinguish between notices, warnings and errors here --- the old-style
* generic error handler wouldn't have done that either.
*/
btw. you wrote that comment in cacd42d62cb2ddf32135b151f627780a5509780f
back in 2011.
The commit message also contains some background information.
--
Robin Haberkorn
Senior Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
"Robin Haberkorn" <haberkorn@b1-systems.de> writes:
btw. you wrote that comment in cacd42d62cb2ddf32135b151f627780a5509780f
back in 2011.
The commit message also contains some background information.
Hm. I wonder if we couldn't get rid of the
HAVE_XMLSTRUCTUREDERRORCONTEXT conditionalization. If the
libxml2 versions that lacked that were "old" in 2011, surely
they are extinct in the wild by now. I'm thinking that we
really don't want to be using the generic handler at all, given
the commit log's comment that it can't distinguish between
error, warning, and notice cases.
regards, tom lane
On Thu Apr 24, 2025 at 01:59:10 GMT +03, Tom Lane wrote:
Hm. I wonder if we couldn't get rid of the
HAVE_XMLSTRUCTUREDERRORCONTEXT conditionalization. If the
libxml2 versions that lacked that were "old" in 2011, surely
they are extinct in the wild by now. I'm thinking that we
really don't want to be using the generic handler at all, given
the commit log's comment that it can't distinguish between
error, warning, and notice cases.
libxml 2.7.4 was released 15 years ago, so yes these #ifdefs can
probably be removed already. What's the oldest distribution/OS you
want to support PG on?
Even Ubuntu 14.04 from 2014 already had libxml 2.9.1.
If you like, I will prepare another patch to remove
HAVE_XMLSTRUCTUREDERRORCONTEXT in a separate thread.
PS: I added the libxslt error handling patch to the next commitfest:
https://commitfest.postgresql.org/patch/5718/
--
Robin Haberkorn
Senior Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
On Tue Apr 22, 2025 at 17:47:20 GMT +03, Robin Haberkorn wrote:
Perhaps you can tell me what else is preventing adoption into core. I believe
that xslt_process() should also accept the `xml` type as an alternative to
strings. Strings should be kept for backwards compatibility, though. Also,
the stylesheet parameter passing is very crude and limited. I suggest, that
the current method of passing them as strings (see parse_params()) is
deprecated and we don't try to tackle its limitations, but will instead
accept hstores. If you agree, I will work on these two tasks next.
On the other hand, since hstore is a module, it seems we cannot really
have any other module or core depend on it, correct?
Perhaps the xmltype-variant of xslt_process() could accept string arrays
instead.
--
Robin Haberkorn
Senior Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
Hello again,
I did prepare another patch, which introduces an xmltype-based alternative
for xslt_process(). I would really like to put that up for discussion. IMHO
with this patch, xslt_process() should be ready to be adopted into core.
Unfortunately, it cleanly applies only on top of the error handling patch I
provided in this thread earlier and it wouldn't really make sense to create a
version that applies on top of master. It just touches the same code as the
previous patch, so they necessarily have to be applied in a certain order. So
I am unsure how to go on.
I attached both patches in question as a series and would suggest to just
update the Commitfest entry. On the other hand, you might well accept the
first patch, but reject the second one.
Yours sincerely,
Robin Haberkorn
On Mon Apr 28, 2025 at 14:12:22 GMT +03, Robin Haberkorn wrote:
On Tue Apr 22, 2025 at 17:47:20 GMT +03, Robin Haberkorn wrote:
Perhaps you can tell me what else is preventing adoption into core. I believe
that xslt_process() should also accept the `xml` type as an alternative to
strings. Strings should be kept for backwards compatibility, though. Also,
the stylesheet parameter passing is very crude and limited. I suggest, that
the current method of passing them as strings (see parse_params()) is
deprecated and we don't try to tackle its limitations, but will instead
accept hstores. If you agree, I will work on these two tasks next.On the other hand, since hstore is a module, it seems we cannot really
have any other module or core depend on it, correct?
Perhaps the xmltype-variant of xslt_process() could accept string arrays
instead.
--
Robin Haberkorn
Senior Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
Attachments:
v3-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchtext/x-patch; charset=utf-8; name=v3-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchDownload
From fa55bf28a2a45ae91a3363792b39e77d7c055cd2 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Thu, 17 Apr 2025 13:15:48 +0300
Subject: [PATCH v3 1/2] contrib/xml2: xslt_process() now reports XSLT-related
error details
* It sets (and restores) the libxslt error handler (xsltSetGenericErrorFunc()).
Since it only supports old-school "generic" error handlers, which are no longer
used in PG's libxml code, we reintroduced a "generic" error handler
xml_generic_error_handler() in xml.c.
* The alternative would have been to expose PgXmlErrorContext in xml.h,
so we could implement a generic handler in xslt_proc.c.
This is obviously not desirable, as it would depend on USE_LIBXML.
* No longer use the PG_XML_STRICTNESS_LEGACY for error handling,
but query the error_occurred-flag via pg_xml_error_occurred().
The existance of this getter is another hint, that we are not supposed
to expose PgXmlErrorContext in xml.h.
* This change means that xslt_process() now reports not only details
about XML parsing errors, but XSLT-schema deviations and missing
stylesheet parameters as well.
* The XSLT error messages now also contain line numbers.
For that to work, we had to set a dummy "SQL" URL when parsing XML strings.
This is important, since xsltPrintErrorContext() includes
line numbers only if an URL is set.
* The xsltSaveResultToString() error handling has been removed.
It can practically only fail in OOM situations and there is no reason
to handle them any different than with the other libxslt functions.
* Updated test suite and added test case for detecting missing
stylesheet parameters.
This was initially reported here but has obviously been fixed in the
meantime:
https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com
---
contrib/xml2/expected/xml2.out | 17 +++++++++++++++
contrib/xml2/sql/xml2.sql | 8 +++++++
contrib/xml2/xslt_proc.c | 39 +++++++++++++++++++++++-----------
src/backend/utils/adt/xml.c | 29 +++++++++++++++++++++++++
src/include/utils/xml.h | 2 ++
5 files changed, 83 insertions(+), 12 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 3d97b14c3a..157d584e63 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -261,3 +261,20 @@ $$<xsl:stylesheet version="1.0"
</xsl:template>
</xsl:stylesheet>$$);
ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 7 element output
+File write for 0wn3d.txt refused
+runtime error: file SQL line 7 element output
+xsltDocumentElem: write rights for 0wn3d.txt denied
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
+ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 3 element value-of
+Variable 'n1' has not been declared.
+Undefined variable
+runtime error: file SQL line 3 element value-of
+XPath evaluation returned no result.
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index ef99d164f2..9d42ac8a0b 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -153,3 +153,11 @@ $$<xsl:stylesheet version="1.0"
</sax:output>
</xsl:template>
</xsl:stylesheet>$$);
+
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index b720d89f75..2aa44d39e7 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -61,6 +61,10 @@ xslt_process(PG_FUNCTION_ARGS)
xmlChar *resstr = NULL;
int reslen = 0;
+ /* the previous libxslt error context */
+ xmlGenericErrorFunc saved_errfunc;
+ void *saved_errcxt;
+
if (fcinfo->nargs == 3)
{
paramstr = PG_GETARG_TEXT_PP(2);
@@ -74,35 +78,45 @@ xslt_process(PG_FUNCTION_ARGS)
}
/* Setup parser */
- xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
+ xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
+
+ /*
+ * Save the previous libxslt error context.
+ */
+ saved_errfunc = xsltGenericError;
+ saved_errcxt = xsltGenericErrorContext;
+ xsltSetGenericErrorFunc(xmlerrcxt, xml_generic_error_handler);
PG_TRY();
{
xmlDocPtr ssdoc;
bool xslt_sec_prefs_error;
- /* Parse document */
+ /*
+ * Parse document. It's important to set an "URL", so libxslt includes
+ * line numbers in error messages (cf. xsltPrintErrorContext()).
+ */
doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), NULL, NULL,
+ VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
XML_PARSE_NOENT);
- if (doctree == NULL)
+ if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing XML document");
/* Same for stylesheet */
ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), NULL, NULL,
+ VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
XML_PARSE_NOENT);
- if (ssdoc == NULL)
+ if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing stylesheet as XML document");
/* After this call we need not free ssdoc separately */
stylesheet = xsltParseStylesheetDoc(ssdoc);
- if (stylesheet == NULL)
+ if (stylesheet == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
"failed to parse stylesheet");
@@ -137,11 +151,14 @@ xslt_process(PG_FUNCTION_ARGS)
restree = xsltApplyStylesheetUser(stylesheet, doctree, params,
NULL, NULL, xslt_ctxt);
- if (restree == NULL)
+ if (restree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
"failed to apply stylesheet");
resstat = xsltSaveResultToString(&resstr, &reslen, restree, stylesheet);
+ if (resstat < 0 || pg_xml_error_occurred(xmlerrcxt))
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
+ "failed to save result to string");
}
PG_CATCH();
{
@@ -157,6 +174,7 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFreeDoc(doctree);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, true);
PG_RE_THROW();
@@ -170,12 +188,9 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFreeDoc(doctree);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, false);
- /* XXX this is pretty dubious, really ought to throw error instead */
- if (resstat < 0)
- PG_RETURN_NULL();
-
result = cstring_to_text_with_len((char *) resstr, reslen);
if (resstr)
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index db8d0d6a7e..61b0e2e117 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -2078,6 +2078,35 @@ xml_errsave(Node *escontext, PgXmlErrorContext *errcxt,
detail ? errdetail_internal("%s", detail) : 0));
}
+/*
+ * Generic error handler for libxml errors and warnings.
+ * This is not used by this module, but may be useful for
+ * libxml-based libraries like libxslt, which do not support
+ * structured error handlers.
+ */
+void
+xml_generic_error_handler(void *data, const char *msg,...)
+{
+ PgXmlErrorContext *xmlerrcxt = (PgXmlErrorContext *) data;
+ va_list ap;
+
+ /*
+ * Defend against someone passing us a bogus context struct.
+ *
+ * We force a backend exit if this check fails because longjmp'ing out of
+ * libxslt would likely render it unsafe to use further.
+ */
+ if (xmlerrcxt->magic != ERRCXT_MAGIC)
+ elog(FATAL, "xml_generic_error_handler called with invalid PgXmlErrorContext");
+
+ appendStringInfoLineSeparator(&xmlerrcxt->err_buf);
+ va_start(ap, msg);
+ appendStringInfoVA(&xmlerrcxt->err_buf, msg, ap);
+ va_end(ap);
+
+ /* Get rid of any trailing newlines in errorBuf */
+ chopStringInfoNewlines(&xmlerrcxt->err_buf);
+}
/*
* Error handler for libxml errors and warnings
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 0d7a816b9f..7cb101e81d 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -66,6 +66,8 @@ extern void pg_xml_init_library(void);
extern PgXmlErrorContext *pg_xml_init(PgXmlStrictness strictness);
extern void pg_xml_done(PgXmlErrorContext *errcxt, bool isError);
extern bool pg_xml_error_occurred(PgXmlErrorContext *errcxt);
+extern void xml_generic_error_handler(void *data, const char *msg,...)
+ pg_attribute_printf(2, 3);
extern void xml_ereport(PgXmlErrorContext *errcxt, int level, int sqlcode,
const char *msg);
--
2.49.0
v3-0002-contrib-xml2-overloaded-xslt_process-to-provide-v.patchtext/x-patch; charset=utf-8; name=v3-0002-contrib-xml2-overloaded-xslt_process-to-provide-v.patchDownload
From 7d77b0afd7f4b99baa93fbc9dd6ddd21a1a4c560 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Wed, 30 Apr 2025 13:19:42 +0300
Subject: [PATCH v3 2/2] contrib/xml2: overloaded xslt_process() to provide
variants for xmltype and specifying parameters in arrays
* There are apparently no functions that accept XML as text, except for xmlparse().
xslt_process() should therefore also accept xmltype.
* A version accepting text is still kept for backwards compatibility, but is considered
deprecated.
* The new xmltype-based version expects an array of stylesheet parameter-value pairs,
which is less limited than the now deprecated way of encoding all stylesheet parameters into
a single text argument.
We can now accept an arbitrary number of parameters and you can include `=` and `,` signs
in both the key and value strings.
Hstores haven't been used since they are in a module and we don't want to depend on any
additional module.
* The new implementation respects the database's encoding - text strings are always converted to UTF8
before passing them into libxml2.
* On the downside, xml_parse() had to be made an external function.
Since a declaration cannot be added to xml.h without drawing in libxml2 headers,
the declaration is repeated in xslt_proc.c.
Perhaps xml_parse() should be declared in a separate internal header?
* xmlCtxtReadDoc() now sets a dummy "SQL" URL to preserve line numbers in XSLT stylesheet errors.
This change at least does not break the test suite.
---
contrib/xml2/expected/xml2.out | 13 +++
contrib/xml2/sql/xml2.sql | 8 ++
contrib/xml2/xml2--1.1.sql | 11 +++
contrib/xml2/xslt_proc.c | 147 ++++++++++++++++++++++++++-------
doc/src/sgml/xml2.sgml | 19 +++--
src/backend/utils/adt/xml.c | 19 +++--
6 files changed, 174 insertions(+), 43 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 157d584e63..0a8a628020 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -278,3 +278,16 @@ Variable 'n1' has not been declared.
Undefined variable
runtime error: file SQL line 3 element value-of
XPath evaluation returned no result.
+-- xmltype and Array-based signature
+SELECT xslt_process(xmlelement(name xml),
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$::xml, ARRAY['n1','"foo"']);
+ xslt_process
+--------------
+ foo +
+
+(1 row)
+
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index 9d42ac8a0b..7555854d49 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -161,3 +161,11 @@ $$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<value-of select="$n1"/>
</template>
</stylesheet>$$)::xml;
+
+-- xmltype and Array-based signature
+SELECT xslt_process(xmlelement(name xml),
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$::xml, ARRAY['n1','"foo"']);
diff --git a/contrib/xml2/xml2--1.1.sql b/contrib/xml2/xml2--1.1.sql
index 671372cb27..a579a1e5e1 100644
--- a/contrib/xml2/xml2--1.1.sql
+++ b/contrib/xml2/xml2--1.1.sql
@@ -71,3 +71,14 @@ CREATE FUNCTION xslt_process(text,text)
RETURNS text
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION xslt_process(xml,xml,text[])
+RETURNS xml
+AS 'MODULE_PATHNAME','xslt_process_xmltype'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+-- the function checks for the correct argument count
+CREATE FUNCTION xslt_process(xml,xml)
+RETURNS xml
+AS 'MODULE_PATHNAME','xslt_process_xmltype'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index 2aa44d39e7..2a29b13f88 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -10,6 +10,9 @@
#include "fmgr.h"
#include "utils/builtins.h"
#include "utils/xml.h"
+#include "utils/array.h"
+#include "utils/memutils.h"
+#include "mb/pg_wchar.h"
#ifdef USE_LIBXSLT
@@ -35,9 +38,18 @@
extern PgXmlErrorContext *pgxml_parser_init(PgXmlStrictness strictness);
/* local defs */
+static xmltype *xslt_process_internal(xmltype *doct, xmltype *ssheet, const char **params);
static const char **parse_params(text *paramstr);
#endif /* USE_LIBXSLT */
+/*
+ * FIXME: This cannot easily be exposed in xml.h.
+ * Perhaps there should be an xml-internal.h?
+ */
+xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
+ bool preserve_whitespace, int encoding,
+ XmlOptionType *parsed_xmloptiontype, xmlNodePtr *parsed_nodes,
+ Node *escontext);
PG_FUNCTION_INFO_V1(xslt_process);
@@ -48,9 +60,103 @@ xslt_process(PG_FUNCTION_ARGS)
text *doct = PG_GETARG_TEXT_PP(0);
text *ssheet = PG_GETARG_TEXT_PP(1);
+ const char **params = NULL;
+ text *result;
+
+ if (fcinfo->nargs == 3)
+ {
+ text *paramstr = PG_GETARG_TEXT_PP(2);
+
+ params = parse_params(paramstr);
+ }
+
+ result = xslt_process_internal(doct, ssheet, params);
+
+ PG_RETURN_TEXT_P(result);
+
+#else /* !USE_LIBXSLT */
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("xslt_process() is not available without libxslt")));
+ PG_RETURN_NULL();
+
+#endif /* USE_LIBXSLT */
+}
+
+PG_FUNCTION_INFO_V1(xslt_process_xmltype);
+
+Datum
+xslt_process_xmltype(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXSLT
+
+ xmltype *doct = PG_GETARG_XML_P(0);
+ xmltype *ssheet = PG_GETARG_XML_P(1);
+ const char **params = NULL;
+ xmltype *result;
+
+ /*
+ * Parameters are key-value pairs. The values are XPath expressions, so
+ * strings will have to be escaped with single or double quotes. Even
+ * `xsltproc --stringparam` does nothing else than adding single or double
+ * quotes and fails if the value contains both.
+ */
+ if (fcinfo->nargs == 3)
+ {
+ ArrayType *paramarray = PG_GETARG_ARRAYTYPE_P(2);
+ Datum *arr_datums;
+ bool *arr_nulls;
+ int arr_count;
+ int i,
+ j;
+
+ deconstruct_array_builtin(paramarray, TEXTOID, &arr_datums, &arr_nulls, &arr_count);
+
+ if ((arr_count % 2) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_ARRAY_ELEMENT_ERROR),
+ errmsg("number of stylesheet parameters (%d) must be a multiple of 2",
+ arr_count)));
+
+ params = palloc_array(const char *, arr_count + 1);
+
+ for (i = 0, j = 0; i < arr_count; i++)
+ {
+ char *cstr;
+
+ if (arr_nulls[i])
+ continue;
+
+ cstr = TextDatumGetCString(arr_datums[i]);
+ params[j++] = (char *) pg_do_encoding_conversion((unsigned char *) cstr,
+ strlen(cstr),
+ GetDatabaseEncoding(),
+ PG_UTF8);
+ }
+ params[j] = NULL;
+ }
+
+ result = xslt_process_internal(doct, ssheet, params);
+
+ PG_RETURN_XML_P(result);
+
+#else /* !USE_LIBXSLT */
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("xslt_process() is not available without libxslt")));
+ PG_RETURN_NULL();
+
+#endif /* USE_LIBXSLT */
+}
+
+#ifdef USE_LIBXSLT
+
+static xmltype *
+xslt_process_internal(xmltype *doct, xmltype *ssheet, const char **params)
+{
text *result;
- text *paramstr;
- const char **params;
PgXmlErrorContext *xmlerrcxt;
volatile xsltStylesheetPtr stylesheet = NULL;
volatile xmlDocPtr doctree = NULL;
@@ -65,18 +171,6 @@ xslt_process(PG_FUNCTION_ARGS)
xmlGenericErrorFunc saved_errfunc;
void *saved_errcxt;
- if (fcinfo->nargs == 3)
- {
- paramstr = PG_GETARG_TEXT_PP(2);
- params = parse_params(paramstr);
- }
- else
- {
- /* No parameters */
- params = (const char **) palloc(sizeof(char *));
- params[0] = NULL;
- }
-
/* Setup parser */
xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
@@ -96,18 +190,16 @@ xslt_process(PG_FUNCTION_ARGS)
* Parse document. It's important to set an "URL", so libxslt includes
* line numbers in error messages (cf. xsltPrintErrorContext()).
*/
- doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
- XML_PARSE_NOENT);
+ doctree = xml_parse(doct, XMLOPTION_DOCUMENT, true,
+ GetDatabaseEncoding(), NULL, NULL, NULL);
if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing XML document");
/* Same for stylesheet */
- ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
- XML_PARSE_NOENT);
+ ssdoc = xml_parse(ssheet, XMLOPTION_DOCUMENT, true,
+ GetDatabaseEncoding(), NULL, NULL, NULL);
if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
@@ -155,6 +247,10 @@ xslt_process(PG_FUNCTION_ARGS)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
"failed to apply stylesheet");
+ /*
+ * FIXME: There is probably a function to convert the string to the
+ * correct database encoding.
+ */
resstat = xsltSaveResultToString(&resstr, &reslen, restree, stylesheet);
if (resstat < 0 || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
@@ -196,18 +292,9 @@ xslt_process(PG_FUNCTION_ARGS)
if (resstr)
xmlFree(resstr);
- PG_RETURN_TEXT_P(result);
-#else /* !USE_LIBXSLT */
-
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("xslt_process() is not available without libxslt")));
- PG_RETURN_NULL();
-#endif /* USE_LIBXSLT */
+ return result;
}
-#ifdef USE_LIBXSLT
-
static const char **
parse_params(text *paramstr)
{
diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml
index 9fd613f967..dc6fb40121 100644
--- a/doc/src/sgml/xml2.sgml
+++ b/doc/src/sgml/xml2.sgml
@@ -408,22 +408,29 @@ ORDER BY doc_num, line_num;
</indexterm>
<synopsis>
-xslt_process(text document, text stylesheet, text paramlist) returns text
+xslt_process(xml document, xml stylesheet, text[] paramlist) returns xml
</synopsis>
<para>
This function applies the XSL stylesheet to the document and returns
- the transformed result. The <literal>paramlist</literal> is a list of parameter
- assignments to be used in the transformation, specified in the form
- <literal>a=1,b=2</literal>. Note that the
- parameter parsing is very simple-minded: parameter values cannot
- contain commas!
+ the transformed result. The <literal>paramlist</literal> is an array of parameter
+ assignments to be used in the transformation, specified in pairs of
+ key and value strings (e.g. <literal>ARRAY['a','1', 'b','2']</literal>).
+ The length of the array must be even.
+ Note that the values are still interpreted as XPath expressions, so string values need to
+ be quoted in single or double quotes (e.g. <literal>ARRAY['a','"string"']</literal>).
</para>
<para>
There is also a two-parameter version of <function>xslt_process</function> which
does not pass any parameters to the transformation.
</para>
+
+ <para>
+ <emphasis>Deprecated</emphasis> variants of <function>xslt_process</function> accepting
+ text arguments and parameters encoded into single text strings
+ (e.g. <literal>a=1,b=2</literal>) are also still available.
+ </para>
</sect3>
</sect2>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 61b0e2e117..08faa0dc86 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -155,11 +155,11 @@ static int parse_xml_decl(const xmlChar *str, size_t *lenp,
static bool print_xml_decl(StringInfo buf, const xmlChar *version,
pg_enc encoding, int standalone);
static bool xml_doctype_in_content(const xmlChar *str);
-static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
- bool preserve_whitespace, int encoding,
- XmlOptionType *parsed_xmloptiontype,
- xmlNodePtr *parsed_nodes,
- Node *escontext);
+xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
+ bool preserve_whitespace, int encoding,
+ XmlOptionType *parsed_xmloptiontype,
+ xmlNodePtr *parsed_nodes,
+ Node *escontext);
static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
ArrayBuildState *astate,
@@ -1732,7 +1732,7 @@ xml_doctype_in_content(const xmlChar *str)
* TODO maybe libxml2's xmlreader is better? (do not construct DOM,
* yet do not use SAX - see xmlreader.c)
*/
-static xmlDocPtr
+xmlDocPtr
xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding,
XmlOptionType *parsed_xmloptiontype, xmlNodePtr *parsed_nodes,
@@ -1828,8 +1828,13 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate parser context");
+ /*
+ * Setting a dummy "SQL" URL is important for the
+ * xsltPrintErrorContext() when using the legacy text-based
+ * xslt_process() variant.
+ */
doc = xmlCtxtReadDoc(ctxt, utf8string,
- NULL, /* no URL */
+ "SQL",
"UTF-8",
options);
--
2.49.0
After the changes introduced by Michael Paquier (commited as
732061150b004385810e522f8629f5bf91d977b7) I had to rebase my patches. Among
other things pg_xml_error_occurred() was already called except after
xsltSaveResultToString(). From what I understand, it's not really necessary
to mangle the encoding of the strings that underlie xmltype, so I removed
a FIXME comment in the 2nd patch compared to v3.
IMHO the PG_XML_STRICTNESS_LEGACY can now also be removed from xpath.c,
but I will raise this question in the appropriate thread again.
Best regards,
Robin Haberkorn
On Wed May 7, 2025 at 11:16:37 GMT +03, Robin Haberkorn wrote:
Hello again,
I did prepare another patch, which introduces an xmltype-based alternative
for xslt_process(). I would really like to put that up for discussion. IMHO
with this patch, xslt_process() should be ready to be adopted into core.Unfortunately, it cleanly applies only on top of the error handling patch I
provided in this thread earlier and it wouldn't really make sense to create a
version that applies on top of master. It just touches the same code as the
previous patch, so they necessarily have to be applied in a certain order. So
I am unsure how to go on.I attached both patches in question as a series and would suggest to just
update the Commitfest entry. On the other hand, you might well accept the
first patch, but reject the second one.Yours sincerely,
Robin HaberkornOn Mon Apr 28, 2025 at 14:12:22 GMT +03, Robin Haberkorn wrote:
On Tue Apr 22, 2025 at 17:47:20 GMT +03, Robin Haberkorn wrote:
Perhaps you can tell me what else is preventing adoption into core. I believe
that xslt_process() should also accept the `xml` type as an alternative to
strings. Strings should be kept for backwards compatibility, though. Also,
the stylesheet parameter passing is very crude and limited. I suggest, that
the current method of passing them as strings (see parse_params()) is
deprecated and we don't try to tackle its limitations, but will instead
accept hstores. If you agree, I will work on these two tasks next.On the other hand, since hstore is a module, it seems we cannot really
have any other module or core depend on it, correct?
Perhaps the xmltype-variant of xslt_process() could accept string arrays
instead.
--
Robin Haberkorn
Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
Attachments:
v4-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchtext/x-patch; charset=utf-8; name=v4-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchDownload
From 237ae8788150249699e17fda2e47363d917b5ffe Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Thu, 17 Apr 2025 13:15:48 +0300
Subject: [PATCH v4 1/2] contrib/xml2: xslt_process() now reports XSLT-related
error details
* It sets (and restores) the libxslt error handler (xsltSetGenericErrorFunc()).
Since it only supports old-school "generic" error handlers, which are no longer
used in PG's libxml code, we reintroduced a "generic" error handler
xml_generic_error_handler() in xml.c.
* The alternative would have been to expose PgXmlErrorContext in xml.h,
so we could implement a generic handler in xslt_proc.c.
This is obviously not desirable, as it would depend on USE_LIBXML.
* No longer use the PG_XML_STRICTNESS_LEGACY for error handling as
the err_occurred flag was already checked via pg_xml_error_occurred()
since 732061150b004385810e522f8629f5bf91d977b7.
* This change means that xslt_process() now reports not only details
about XML parsing errors, but XSLT-schema deviations and missing
stylesheet parameters as well.
* The XSLT error messages now also contain line numbers.
For that to work, we had to set a dummy "SQL" URL when parsing XML strings.
This is important, since xsltPrintErrorContext() includes
line numbers only if an URL is set.
* The special xsltSaveResultToString() error handling has been removed.
It can practically only fail in OOM situations and there is no reason
to handle them any different than with the other libxslt functions.
* Updated test suite and added test case for detecting missing
stylesheet parameters.
This was initially reported here but has obviously been fixed in the
meantime:
https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com
---
contrib/xml2/expected/xml2.out | 17 +++++++++++++++++
contrib/xml2/sql/xml2.sql | 8 ++++++++
contrib/xml2/xslt_proc.c | 35 +++++++++++++++++++++++-----------
src/backend/utils/adt/xml.c | 29 ++++++++++++++++++++++++++++
src/include/utils/xml.h | 2 ++
5 files changed, 80 insertions(+), 11 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 3d97b14c3a..157d584e63 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -261,3 +261,20 @@ $$<xsl:stylesheet version="1.0"
</xsl:template>
</xsl:stylesheet>$$);
ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 7 element output
+File write for 0wn3d.txt refused
+runtime error: file SQL line 7 element output
+xsltDocumentElem: write rights for 0wn3d.txt denied
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
+ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 3 element value-of
+Variable 'n1' has not been declared.
+Undefined variable
+runtime error: file SQL line 3 element value-of
+XPath evaluation returned no result.
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index ef99d164f2..9d42ac8a0b 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -153,3 +153,11 @@ $$<xsl:stylesheet version="1.0"
</sax:output>
</xsl:template>
</xsl:stylesheet>$$);
+
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index c8e7dd45ed..5efb659439 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -61,6 +61,10 @@ xslt_process(PG_FUNCTION_ARGS)
volatile xmlChar *resstr = NULL;
int reslen = 0;
+ /* the previous libxslt error context */
+ xmlGenericErrorFunc saved_errfunc;
+ void *saved_errcxt;
+
if (fcinfo->nargs == 3)
{
paramstr = PG_GETARG_TEXT_PP(2);
@@ -74,16 +78,26 @@ xslt_process(PG_FUNCTION_ARGS)
}
/* Setup parser */
- xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
+ xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
+
+ /*
+ * Save the previous libxslt error context.
+ */
+ saved_errfunc = xsltGenericError;
+ saved_errcxt = xsltGenericErrorContext;
+ xsltSetGenericErrorFunc(xmlerrcxt, xml_generic_error_handler);
PG_TRY();
{
xmlDocPtr ssdoc;
bool xslt_sec_prefs_error;
- /* Parse document */
+ /*
+ * Parse document. It's important to set an "URL", so libxslt includes
+ * line numbers in error messages (cf. xsltPrintErrorContext()).
+ */
doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), NULL, NULL,
+ VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
XML_PARSE_NOENT);
if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
@@ -92,7 +106,7 @@ xslt_process(PG_FUNCTION_ARGS)
/* Same for stylesheet */
ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), NULL, NULL,
+ VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
XML_PARSE_NOENT);
if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
@@ -143,9 +157,10 @@ xslt_process(PG_FUNCTION_ARGS)
resstat = xsltSaveResultToString((xmlChar **) &resstr, &reslen,
restree, stylesheet);
-
- if (resstat >= 0)
- result = cstring_to_text_with_len((char *) resstr, reslen);
+ if (resstat < 0 || pg_xml_error_occurred(xmlerrcxt))
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
+ "failed to save result to string");
+ result = cstring_to_text_with_len((char *) resstr, reslen);
}
PG_CATCH();
{
@@ -163,6 +178,7 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFree((xmlChar *) resstr);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, true);
PG_RE_THROW();
@@ -179,12 +195,9 @@ xslt_process(PG_FUNCTION_ARGS)
if (resstr)
xmlFree((xmlChar *) resstr);
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, false);
- /* XXX this is pretty dubious, really ought to throw error instead */
- if (resstat < 0)
- PG_RETURN_NULL();
-
PG_RETURN_TEXT_P(result);
#else /* !USE_LIBXSLT */
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 2bd39b6ac4..080d9d2ad9 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -2130,6 +2130,35 @@ xml_errsave(Node *escontext, PgXmlErrorContext *errcxt,
detail ? errdetail_internal("%s", detail) : 0));
}
+/*
+ * Generic error handler for libxml errors and warnings.
+ * This is not used by this module, but may be useful for
+ * libxml-based libraries like libxslt, which do not support
+ * structured error handlers.
+ */
+void
+xml_generic_error_handler(void *data, const char *msg,...)
+{
+ PgXmlErrorContext *xmlerrcxt = (PgXmlErrorContext *) data;
+ va_list ap;
+
+ /*
+ * Defend against someone passing us a bogus context struct.
+ *
+ * We force a backend exit if this check fails because longjmp'ing out of
+ * libxslt would likely render it unsafe to use further.
+ */
+ if (xmlerrcxt->magic != ERRCXT_MAGIC)
+ elog(FATAL, "xml_generic_error_handler called with invalid PgXmlErrorContext");
+
+ appendStringInfoLineSeparator(&xmlerrcxt->err_buf);
+ va_start(ap, msg);
+ appendStringInfoVA(&xmlerrcxt->err_buf, msg, ap);
+ va_end(ap);
+
+ /* Get rid of any trailing newlines in errorBuf */
+ chopStringInfoNewlines(&xmlerrcxt->err_buf);
+}
/*
* Error handler for libxml errors and warnings
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 0d7a816b9f..7cb101e81d 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -66,6 +66,8 @@ extern void pg_xml_init_library(void);
extern PgXmlErrorContext *pg_xml_init(PgXmlStrictness strictness);
extern void pg_xml_done(PgXmlErrorContext *errcxt, bool isError);
extern bool pg_xml_error_occurred(PgXmlErrorContext *errcxt);
+extern void xml_generic_error_handler(void *data, const char *msg,...)
+ pg_attribute_printf(2, 3);
extern void xml_ereport(PgXmlErrorContext *errcxt, int level, int sqlcode,
const char *msg);
--
2.49.0
v4-0002-contrib-xml2-overloaded-xslt_process-to-provide-v.patchtext/x-patch; charset=utf-8; name=v4-0002-contrib-xml2-overloaded-xslt_process-to-provide-v.patchDownload
From f4996f6278fc575ad67e16bf21d00f0be7907457 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Wed, 30 Apr 2025 13:19:42 +0300
Subject: [PATCH v4 2/2] contrib/xml2: overloaded xslt_process() to provide
variants for xmltype and specifying parameters in arrays
* There are apparently no functions that accept XML as text, except for xmlparse().
xslt_process() should therefore also accept xmltype.
* A version accepting text is still kept for backwards compatibility, but is considered
deprecated.
* The new xmltype-based version expects an array of stylesheet parameter-value pairs,
which is less limited than the now deprecated way of encoding all stylesheet parameters into
a single text argument.
We can now accept an arbitrary number of parameters and you can include `=` and `,` signs
in both the key and value strings.
Hstores haven't been used since they are in a module and we don't want to depend on any
additional module.
* The new implementation respects the database's encoding - text strings are always converted to UTF8
before passing them into libxml2.
* On the downside, xml_parse() had to be made an external function.
Since a declaration cannot be added to xml.h without drawing in libxml2 headers,
the declaration is repeated in xslt_proc.c.
Perhaps xml_parse() should be declared in a separate internal header?
* xmlCtxtReadDoc() now sets a dummy "SQL" URL to preserve line numbers in XSLT stylesheet errors.
This change at least does not break the test suite.
---
contrib/xml2/expected/xml2.out | 13 +++
contrib/xml2/sql/xml2.sql | 8 ++
contrib/xml2/xml2--1.1.sql | 11 +++
contrib/xml2/xslt_proc.c | 146 +++++++++++++++++++++++++--------
doc/src/sgml/xml2.sgml | 19 +++--
src/backend/utils/adt/xml.c | 19 +++--
6 files changed, 171 insertions(+), 45 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 157d584e63..0a8a628020 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -278,3 +278,16 @@ Variable 'n1' has not been declared.
Undefined variable
runtime error: file SQL line 3 element value-of
XPath evaluation returned no result.
+-- xmltype and Array-based signature
+SELECT xslt_process(xmlelement(name xml),
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$::xml, ARRAY['n1','"foo"']);
+ xslt_process
+--------------
+ foo +
+
+(1 row)
+
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index 9d42ac8a0b..7555854d49 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -161,3 +161,11 @@ $$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<value-of select="$n1"/>
</template>
</stylesheet>$$)::xml;
+
+-- xmltype and Array-based signature
+SELECT xslt_process(xmlelement(name xml),
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$::xml, ARRAY['n1','"foo"']);
diff --git a/contrib/xml2/xml2--1.1.sql b/contrib/xml2/xml2--1.1.sql
index 671372cb27..a579a1e5e1 100644
--- a/contrib/xml2/xml2--1.1.sql
+++ b/contrib/xml2/xml2--1.1.sql
@@ -71,3 +71,14 @@ CREATE FUNCTION xslt_process(text,text)
RETURNS text
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION xslt_process(xml,xml,text[])
+RETURNS xml
+AS 'MODULE_PATHNAME','xslt_process_xmltype'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+-- the function checks for the correct argument count
+CREATE FUNCTION xslt_process(xml,xml)
+RETURNS xml
+AS 'MODULE_PATHNAME','xslt_process_xmltype'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index 5efb659439..c31ed9d039 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -10,6 +10,9 @@
#include "fmgr.h"
#include "utils/builtins.h"
#include "utils/xml.h"
+#include "utils/array.h"
+#include "utils/memutils.h"
+#include "mb/pg_wchar.h"
#ifdef USE_LIBXSLT
@@ -35,9 +38,18 @@
extern PgXmlErrorContext *pgxml_parser_init(PgXmlStrictness strictness);
/* local defs */
+static xmltype *xslt_process_internal(xmltype *doct, xmltype *ssheet, const char **params);
static const char **parse_params(text *paramstr);
#endif /* USE_LIBXSLT */
+/*
+ * FIXME: This cannot easily be exposed in xml.h.
+ * Perhaps there should be an xml-internal.h?
+ */
+xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
+ bool preserve_whitespace, int encoding,
+ XmlOptionType *parsed_xmloptiontype, xmlNodePtr *parsed_nodes,
+ Node *escontext);
PG_FUNCTION_INFO_V1(xslt_process);
@@ -48,9 +60,103 @@ xslt_process(PG_FUNCTION_ARGS)
text *doct = PG_GETARG_TEXT_PP(0);
text *ssheet = PG_GETARG_TEXT_PP(1);
+ const char **params = NULL;
+ text *result;
+
+ if (fcinfo->nargs == 3)
+ {
+ text *paramstr = PG_GETARG_TEXT_PP(2);
+
+ params = parse_params(paramstr);
+ }
+
+ result = xslt_process_internal(doct, ssheet, params);
+
+ PG_RETURN_TEXT_P(result);
+
+#else /* !USE_LIBXSLT */
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("xslt_process() is not available without libxslt")));
+ PG_RETURN_NULL();
+
+#endif /* USE_LIBXSLT */
+}
+
+PG_FUNCTION_INFO_V1(xslt_process_xmltype);
+
+Datum
+xslt_process_xmltype(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXSLT
+
+ xmltype *doct = PG_GETARG_XML_P(0);
+ xmltype *ssheet = PG_GETARG_XML_P(1);
+ const char **params = NULL;
+ xmltype *result;
+
+ /*
+ * Parameters are key-value pairs. The values are XPath expressions, so
+ * strings will have to be escaped with single or double quotes. Even
+ * `xsltproc --stringparam` does nothing else than adding single or double
+ * quotes and fails if the value contains both.
+ */
+ if (fcinfo->nargs == 3)
+ {
+ ArrayType *paramarray = PG_GETARG_ARRAYTYPE_P(2);
+ Datum *arr_datums;
+ bool *arr_nulls;
+ int arr_count;
+ int i,
+ j;
+
+ deconstruct_array_builtin(paramarray, TEXTOID, &arr_datums, &arr_nulls, &arr_count);
+
+ if ((arr_count % 2) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_ARRAY_ELEMENT_ERROR),
+ errmsg("number of stylesheet parameters (%d) must be a multiple of 2",
+ arr_count)));
+
+ params = palloc_array(const char *, arr_count + 1);
+
+ for (i = 0, j = 0; i < arr_count; i++)
+ {
+ char *cstr;
+
+ if (arr_nulls[i])
+ continue;
+
+ cstr = TextDatumGetCString(arr_datums[i]);
+ params[j++] = (char *) pg_do_encoding_conversion((unsigned char *) cstr,
+ strlen(cstr),
+ GetDatabaseEncoding(),
+ PG_UTF8);
+ }
+ params[j] = NULL;
+ }
+
+ result = xslt_process_internal(doct, ssheet, params);
+
+ PG_RETURN_XML_P(result);
+
+#else /* !USE_LIBXSLT */
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("xslt_process() is not available without libxslt")));
+ PG_RETURN_NULL();
+
+#endif /* USE_LIBXSLT */
+}
+
+#ifdef USE_LIBXSLT
+
+static xmltype *
+xslt_process_internal(xmltype *doct, xmltype *ssheet, const char **params)
+{
text *result;
- text *paramstr;
- const char **params;
PgXmlErrorContext *xmlerrcxt;
volatile xsltStylesheetPtr stylesheet = NULL;
volatile xmlDocPtr doctree = NULL;
@@ -65,18 +171,6 @@ xslt_process(PG_FUNCTION_ARGS)
xmlGenericErrorFunc saved_errfunc;
void *saved_errcxt;
- if (fcinfo->nargs == 3)
- {
- paramstr = PG_GETARG_TEXT_PP(2);
- params = parse_params(paramstr);
- }
- else
- {
- /* No parameters */
- params = (const char **) palloc(sizeof(char *));
- params[0] = NULL;
- }
-
/* Setup parser */
xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
@@ -93,21 +187,18 @@ xslt_process(PG_FUNCTION_ARGS)
bool xslt_sec_prefs_error;
/*
- * Parse document. It's important to set an "URL", so libxslt includes
- * line numbers in error messages (cf. xsltPrintErrorContext()).
+ * Parse document.
*/
- doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
- XML_PARSE_NOENT);
+ doctree = xml_parse(doct, XMLOPTION_DOCUMENT, true,
+ GetDatabaseEncoding(), NULL, NULL, NULL);
if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing XML document");
/* Same for stylesheet */
- ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
- XML_PARSE_NOENT);
+ ssdoc = xml_parse(ssheet, XMLOPTION_DOCUMENT, true,
+ GetDatabaseEncoding(), NULL, NULL, NULL);
if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
@@ -198,18 +289,9 @@ xslt_process(PG_FUNCTION_ARGS)
xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, false);
- PG_RETURN_TEXT_P(result);
-#else /* !USE_LIBXSLT */
-
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("xslt_process() is not available without libxslt")));
- PG_RETURN_NULL();
-#endif /* USE_LIBXSLT */
+ return result;
}
-#ifdef USE_LIBXSLT
-
static const char **
parse_params(text *paramstr)
{
diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml
index 9fd613f967..dc6fb40121 100644
--- a/doc/src/sgml/xml2.sgml
+++ b/doc/src/sgml/xml2.sgml
@@ -408,22 +408,29 @@ ORDER BY doc_num, line_num;
</indexterm>
<synopsis>
-xslt_process(text document, text stylesheet, text paramlist) returns text
+xslt_process(xml document, xml stylesheet, text[] paramlist) returns xml
</synopsis>
<para>
This function applies the XSL stylesheet to the document and returns
- the transformed result. The <literal>paramlist</literal> is a list of parameter
- assignments to be used in the transformation, specified in the form
- <literal>a=1,b=2</literal>. Note that the
- parameter parsing is very simple-minded: parameter values cannot
- contain commas!
+ the transformed result. The <literal>paramlist</literal> is an array of parameter
+ assignments to be used in the transformation, specified in pairs of
+ key and value strings (e.g. <literal>ARRAY['a','1', 'b','2']</literal>).
+ The length of the array must be even.
+ Note that the values are still interpreted as XPath expressions, so string values need to
+ be quoted in single or double quotes (e.g. <literal>ARRAY['a','"string"']</literal>).
</para>
<para>
There is also a two-parameter version of <function>xslt_process</function> which
does not pass any parameters to the transformation.
</para>
+
+ <para>
+ <emphasis>Deprecated</emphasis> variants of <function>xslt_process</function> accepting
+ text arguments and parameters encoded into single text strings
+ (e.g. <literal>a=1,b=2</literal>) are also still available.
+ </para>
</sect3>
</sect2>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 080d9d2ad9..c14352cb11 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -155,11 +155,11 @@ static int parse_xml_decl(const xmlChar *str, size_t *lenp,
static bool print_xml_decl(StringInfo buf, const xmlChar *version,
pg_enc encoding, int standalone);
static bool xml_doctype_in_content(const xmlChar *str);
-static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
- bool preserve_whitespace, int encoding,
- XmlOptionType *parsed_xmloptiontype,
- xmlNodePtr *parsed_nodes,
- Node *escontext);
+xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
+ bool preserve_whitespace, int encoding,
+ XmlOptionType *parsed_xmloptiontype,
+ xmlNodePtr *parsed_nodes,
+ Node *escontext);
static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
ArrayBuildState *astate,
@@ -1778,7 +1778,7 @@ xml_doctype_in_content(const xmlChar *str)
* TODO maybe libxml2's xmlreader is better? (do not construct DOM,
* yet do not use SAX - see xmlreader.c)
*/
-static xmlDocPtr
+xmlDocPtr
xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding,
XmlOptionType *parsed_xmloptiontype, xmlNodePtr *parsed_nodes,
@@ -1874,8 +1874,13 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate parser context");
+ /*
+ * Setting a dummy "SQL" URL is important for the
+ * xsltPrintErrorContext() when using the legacy text-based
+ * xslt_process() variant.
+ */
doc = xmlCtxtReadDoc(ctxt, utf8string,
- NULL, /* no URL */
+ "SQL",
"UTF-8",
options);
--
2.49.0
And yet again after the changes introduced in
93001888d85c21a5b9ab1fe8dabfecb673fc007c by Tom Lane I had to rebase my
patches (see attachments).
An open question on the 2nd patch remains how to properly expose
xml_parse(). That is, if somebody is interested in merging it at all.
You might want to apply only the first one as well.
Best regards,
Robin Haberkorn
On Mon Jul 7, 2025 at 19:15:52 GMT +03, Robin Haberkorn wrote:
After the changes introduced by Michael Paquier (commited as
732061150b004385810e522f8629f5bf91d977b7) I had to rebase my patches. Among
other things pg_xml_error_occurred() was already called except after
xsltSaveResultToString(). From what I understand, it's not really necessary
to mangle the encoding of the strings that underlie xmltype, so I removed
a FIXME comment in the 2nd patch compared to v3.IMHO the PG_XML_STRICTNESS_LEGACY can now also be removed from xpath.c,
but I will raise this question in the appropriate thread again.Best regards,
Robin HaberkornOn Wed May 7, 2025 at 11:16:37 GMT +03, Robin Haberkorn wrote:
Hello again,
I did prepare another patch, which introduces an xmltype-based alternative
for xslt_process(). I would really like to put that up for discussion. IMHO
with this patch, xslt_process() should be ready to be adopted into core.Unfortunately, it cleanly applies only on top of the error handling patch I
provided in this thread earlier and it wouldn't really make sense to create a
version that applies on top of master. It just touches the same code as the
previous patch, so they necessarily have to be applied in a certain order. So
I am unsure how to go on.I attached both patches in question as a series and would suggest to just
update the Commitfest entry. On the other hand, you might well accept the
first patch, but reject the second one.Yours sincerely,
Robin HaberkornOn Mon Apr 28, 2025 at 14:12:22 GMT +03, Robin Haberkorn wrote:
On Tue Apr 22, 2025 at 17:47:20 GMT +03, Robin Haberkorn wrote:
Perhaps you can tell me what else is preventing adoption into core. I believe
that xslt_process() should also accept the `xml` type as an alternative to
strings. Strings should be kept for backwards compatibility, though. Also,
the stylesheet parameter passing is very crude and limited. I suggest, that
the current method of passing them as strings (see parse_params()) is
deprecated and we don't try to tackle its limitations, but will instead
accept hstores. If you agree, I will work on these two tasks next.On the other hand, since hstore is a module, it seems we cannot really
have any other module or core depend on it, correct?
Perhaps the xmltype-variant of xslt_process() could accept string arrays
instead.
--
Robin Haberkorn
Software Engineer
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
Attachments:
v5-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchtext/x-patch; charset=utf-8; name=v5-0001-contrib-xml2-xslt_process-now-reports-XSLT-relate.patchDownload
From 335a4d2cd7ee5b199db99e8861774acbf046b6c1 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Thu, 17 Apr 2025 13:15:48 +0300
Subject: [PATCH v5 1/2] contrib/xml2: xslt_process() now reports XSLT-related
error details
* It sets (and restores) the libxslt error handler (xsltSetGenericErrorFunc()).
Since it only supports old-school "generic" error handlers, which are no longer
used in PG's libxml code, we reintroduced a "generic" error handler
xml_generic_error_handler() in xml.c.
* The alternative would have been to expose PgXmlErrorContext in xml.h,
so we could implement a generic handler in xslt_proc.c.
This is obviously not desirable, as it would depend on USE_LIBXML.
* No longer use the PG_XML_STRICTNESS_LEGACY for error handling as
the err_occurred flag was already checked via pg_xml_error_occurred()
since 732061150b004385810e522f8629f5bf91d977b7.
* This change means that xslt_process() now reports not only details
about XML parsing errors, but XSLT-schema deviations and missing
stylesheet parameters as well.
* The XSLT error messages now also contain line numbers.
For that to work, we had to set a dummy "SQL" URL when parsing XML strings.
This is important, since xsltPrintErrorContext() includes
line numbers only if an URL is set.
* The special xsltSaveResultToString() error handling has been removed.
It can practically only fail in OOM situations and there is no reason
to handle them any different than with the other libxslt functions.
* Updated test suite and added test case for detecting missing
stylesheet parameters.
This was initially reported here but has obviously been fixed in the
meantime:
https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com
---
contrib/xml2/expected/xml2.out | 17 +++++++++++++++++
contrib/xml2/sql/xml2.sql | 8 ++++++++
contrib/xml2/xslt_proc.c | 35 +++++++++++++++++++++++-----------
src/backend/utils/adt/xml.c | 29 ++++++++++++++++++++++++++++
src/include/utils/xml.h | 2 ++
5 files changed, 80 insertions(+), 11 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 3d97b14c3a..157d584e63 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -261,3 +261,20 @@ $$<xsl:stylesheet version="1.0"
</xsl:template>
</xsl:stylesheet>$$);
ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 7 element output
+File write for 0wn3d.txt refused
+runtime error: file SQL line 7 element output
+xsltDocumentElem: write rights for 0wn3d.txt denied
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
+ERROR: failed to apply stylesheet
+DETAIL: runtime error: file SQL line 3 element value-of
+Variable 'n1' has not been declared.
+Undefined variable
+runtime error: file SQL line 3 element value-of
+XPath evaluation returned no result.
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index ef99d164f2..9d42ac8a0b 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -153,3 +153,11 @@ $$<xsl:stylesheet version="1.0"
</sax:output>
</xsl:template>
</xsl:stylesheet>$$);
+
+-- detecting missing stylesheet parameter
+SELECT xslt_process('<xml/>',
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$)::xml;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index 53550c7dc2..17776f78b5 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -60,6 +60,10 @@ xslt_process(PG_FUNCTION_ARGS)
volatile int resstat = -1;
xmlChar *volatile resstr = NULL;
+ /* the previous libxslt error context */
+ xmlGenericErrorFunc saved_errfunc;
+ void *saved_errcxt;
+
if (fcinfo->nargs == 3)
{
paramstr = PG_GETARG_TEXT_PP(2);
@@ -73,7 +77,14 @@ xslt_process(PG_FUNCTION_ARGS)
}
/* Setup parser */
- xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
+ xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
+
+ /*
+ * Save the previous libxslt error context.
+ */
+ saved_errfunc = xsltGenericError;
+ saved_errcxt = xsltGenericErrorContext;
+ xsltSetGenericErrorFunc(xmlerrcxt, xml_generic_error_handler);
PG_TRY();
{
@@ -81,9 +92,12 @@ xslt_process(PG_FUNCTION_ARGS)
bool xslt_sec_prefs_error;
int reslen = 0;
- /* Parse document */
+ /*
+ * Parse document. It's important to set an "URL", so libxslt includes
+ * line numbers in error messages (cf. xsltPrintErrorContext()).
+ */
doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), NULL, NULL,
+ VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
XML_PARSE_NOENT);
if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
@@ -92,7 +106,7 @@ xslt_process(PG_FUNCTION_ARGS)
/* Same for stylesheet */
ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), NULL, NULL,
+ VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
XML_PARSE_NOENT);
if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
@@ -143,9 +157,10 @@ xslt_process(PG_FUNCTION_ARGS)
resstat = xsltSaveResultToString((xmlChar **) &resstr, &reslen,
restree, stylesheet);
-
- if (resstat >= 0)
- result = cstring_to_text_with_len((char *) resstr, reslen);
+ if (resstat < 0 || pg_xml_error_occurred(xmlerrcxt))
+ xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_ARGUMENT_FOR_XQUERY,
+ "failed to save result to string");
+ result = cstring_to_text_with_len((char *) resstr, reslen);
}
PG_CATCH();
{
@@ -163,6 +178,7 @@ xslt_process(PG_FUNCTION_ARGS)
xmlFree(resstr);
xsltCleanupGlobals();
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, true);
PG_RE_THROW();
@@ -179,12 +195,9 @@ xslt_process(PG_FUNCTION_ARGS)
if (resstr)
xmlFree(resstr);
+ xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, false);
- /* XXX this is pretty dubious, really ought to throw error instead */
- if (resstat < 0)
- PG_RETURN_NULL();
-
PG_RETURN_TEXT_P(result);
#else /* !USE_LIBXSLT */
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index f7b731825f..f54828fb99 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -2135,6 +2135,35 @@ xml_errsave(Node *escontext, PgXmlErrorContext *errcxt,
detail ? errdetail_internal("%s", detail) : 0));
}
+/*
+ * Generic error handler for libxml errors and warnings.
+ * This is not used by this module, but may be useful for
+ * libxml-based libraries like libxslt, which do not support
+ * structured error handlers.
+ */
+void
+xml_generic_error_handler(void *data, const char *msg,...)
+{
+ PgXmlErrorContext *xmlerrcxt = (PgXmlErrorContext *) data;
+ va_list ap;
+
+ /*
+ * Defend against someone passing us a bogus context struct.
+ *
+ * We force a backend exit if this check fails because longjmp'ing out of
+ * libxslt would likely render it unsafe to use further.
+ */
+ if (xmlerrcxt->magic != ERRCXT_MAGIC)
+ elog(FATAL, "xml_generic_error_handler called with invalid PgXmlErrorContext");
+
+ appendStringInfoLineSeparator(&xmlerrcxt->err_buf);
+ va_start(ap, msg);
+ appendStringInfoVA(&xmlerrcxt->err_buf, msg, ap);
+ va_end(ap);
+
+ /* Get rid of any trailing newlines in errorBuf */
+ chopStringInfoNewlines(&xmlerrcxt->err_buf);
+}
/*
* Error handler for libxml errors and warnings
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 0d7a816b9f..7cb101e81d 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -66,6 +66,8 @@ extern void pg_xml_init_library(void);
extern PgXmlErrorContext *pg_xml_init(PgXmlStrictness strictness);
extern void pg_xml_done(PgXmlErrorContext *errcxt, bool isError);
extern bool pg_xml_error_occurred(PgXmlErrorContext *errcxt);
+extern void xml_generic_error_handler(void *data, const char *msg,...)
+ pg_attribute_printf(2, 3);
extern void xml_ereport(PgXmlErrorContext *errcxt, int level, int sqlcode,
const char *msg);
--
2.49.0
v5-0002-contrib-xml2-overloaded-xslt_process-to-provide-v.patchtext/x-patch; charset=utf-8; name=v5-0002-contrib-xml2-overloaded-xslt_process-to-provide-v.patchDownload
From bd2e6c9d4197d3f98db509ec112b6f21db97f287 Mon Sep 17 00:00:00 2001
From: Robin Haberkorn <haberkorn@b1-systems.de>
Date: Wed, 30 Apr 2025 13:19:42 +0300
Subject: [PATCH v5 2/2] contrib/xml2: overloaded xslt_process() to provide
variants for xmltype and specifying parameters in arrays
* There are apparently no functions that accept XML as text, except for xmlparse().
xslt_process() should therefore also accept xmltype.
* A version accepting text is still kept for backwards compatibility, but is considered
deprecated.
* The new xmltype-based version expects an array of stylesheet parameter-value pairs,
which is less limited than the now deprecated way of encoding all stylesheet parameters into
a single text argument.
We can now accept an arbitrary number of parameters and you can include `=` and `,` signs
in both the key and value strings.
Hstores haven't been used since they are in a module and we don't want to depend on any
additional module.
* The new implementation respects the database's encoding - text strings are always converted to UTF8
before passing them into libxml2.
* On the downside, xml_parse() had to be made an external function.
Since a declaration cannot be added to xml.h without drawing in libxml2 headers,
the declaration is repeated in xslt_proc.c.
Perhaps xml_parse() should be declared in a separate internal header?
* xmlCtxtReadDoc() now sets a dummy "SQL" URL to preserve line numbers in XSLT stylesheet errors.
This change at least does not break the test suite.
---
contrib/xml2/expected/xml2.out | 13 +++
contrib/xml2/sql/xml2.sql | 8 ++
contrib/xml2/xml2--1.1.sql | 11 +++
contrib/xml2/xslt_proc.c | 148 +++++++++++++++++++++++++--------
doc/src/sgml/xml2.sgml | 19 +++--
src/backend/utils/adt/xml.c | 19 +++--
6 files changed, 172 insertions(+), 46 deletions(-)
diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out
index 157d584e63..0a8a628020 100644
--- a/contrib/xml2/expected/xml2.out
+++ b/contrib/xml2/expected/xml2.out
@@ -278,3 +278,16 @@ Variable 'n1' has not been declared.
Undefined variable
runtime error: file SQL line 3 element value-of
XPath evaluation returned no result.
+-- xmltype and Array-based signature
+SELECT xslt_process(xmlelement(name xml),
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$::xml, ARRAY['n1','"foo"']);
+ xslt_process
+--------------
+ foo +
+
+(1 row)
+
diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql
index 9d42ac8a0b..7555854d49 100644
--- a/contrib/xml2/sql/xml2.sql
+++ b/contrib/xml2/sql/xml2.sql
@@ -161,3 +161,11 @@ $$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<value-of select="$n1"/>
</template>
</stylesheet>$$)::xml;
+
+-- xmltype and Array-based signature
+SELECT xslt_process(xmlelement(name xml),
+$$<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
+ <template match="/">
+ <value-of select="$n1"/>
+ </template>
+</stylesheet>$$::xml, ARRAY['n1','"foo"']);
diff --git a/contrib/xml2/xml2--1.1.sql b/contrib/xml2/xml2--1.1.sql
index 671372cb27..a579a1e5e1 100644
--- a/contrib/xml2/xml2--1.1.sql
+++ b/contrib/xml2/xml2--1.1.sql
@@ -71,3 +71,14 @@ CREATE FUNCTION xslt_process(text,text)
RETURNS text
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
+
+CREATE FUNCTION xslt_process(xml,xml,text[])
+RETURNS xml
+AS 'MODULE_PATHNAME','xslt_process_xmltype'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+-- the function checks for the correct argument count
+CREATE FUNCTION xslt_process(xml,xml)
+RETURNS xml
+AS 'MODULE_PATHNAME','xslt_process_xmltype'
+LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index 17776f78b5..074952cf8b 100644
--- a/contrib/xml2/xslt_proc.c
+++ b/contrib/xml2/xslt_proc.c
@@ -10,6 +10,9 @@
#include "fmgr.h"
#include "utils/builtins.h"
#include "utils/xml.h"
+#include "utils/array.h"
+#include "utils/memutils.h"
+#include "mb/pg_wchar.h"
#ifdef USE_LIBXSLT
@@ -35,9 +38,18 @@
extern PgXmlErrorContext *pgxml_parser_init(PgXmlStrictness strictness);
/* local defs */
+static xmltype *xslt_process_internal(xmltype *doct, xmltype *ssheet, const char **params);
static const char **parse_params(text *paramstr);
#endif /* USE_LIBXSLT */
+/*
+ * FIXME: This cannot easily be exposed in xml.h.
+ * Perhaps there should be an xml-internal.h?
+ */
+xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
+ bool preserve_whitespace, int encoding,
+ XmlOptionType *parsed_xmloptiontype, xmlNodePtr *parsed_nodes,
+ Node *escontext);
PG_FUNCTION_INFO_V1(xslt_process);
@@ -48,9 +60,103 @@ xslt_process(PG_FUNCTION_ARGS)
text *doct = PG_GETARG_TEXT_PP(0);
text *ssheet = PG_GETARG_TEXT_PP(1);
- text *volatile result = NULL;
- text *paramstr;
- const char **params;
+ const char **params = NULL;
+ text *result;
+
+ if (fcinfo->nargs == 3)
+ {
+ text *paramstr = PG_GETARG_TEXT_PP(2);
+
+ params = parse_params(paramstr);
+ }
+
+ result = xslt_process_internal(doct, ssheet, params);
+
+ PG_RETURN_TEXT_P(result);
+
+#else /* !USE_LIBXSLT */
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("xslt_process() is not available without libxslt")));
+ PG_RETURN_NULL();
+
+#endif /* USE_LIBXSLT */
+}
+
+PG_FUNCTION_INFO_V1(xslt_process_xmltype);
+
+Datum
+xslt_process_xmltype(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXSLT
+
+ xmltype *doct = PG_GETARG_XML_P(0);
+ xmltype *ssheet = PG_GETARG_XML_P(1);
+ const char **params = NULL;
+ xmltype *result;
+
+ /*
+ * Parameters are key-value pairs. The values are XPath expressions, so
+ * strings will have to be escaped with single or double quotes. Even
+ * `xsltproc --stringparam` does nothing else than adding single or double
+ * quotes and fails if the value contains both.
+ */
+ if (fcinfo->nargs == 3)
+ {
+ ArrayType *paramarray = PG_GETARG_ARRAYTYPE_P(2);
+ Datum *arr_datums;
+ bool *arr_nulls;
+ int arr_count;
+ int i,
+ j;
+
+ deconstruct_array_builtin(paramarray, TEXTOID, &arr_datums, &arr_nulls, &arr_count);
+
+ if ((arr_count % 2) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_ARRAY_ELEMENT_ERROR),
+ errmsg("number of stylesheet parameters (%d) must be a multiple of 2",
+ arr_count)));
+
+ params = palloc_array(const char *, arr_count + 1);
+
+ for (i = 0, j = 0; i < arr_count; i++)
+ {
+ char *cstr;
+
+ if (arr_nulls[i])
+ continue;
+
+ cstr = TextDatumGetCString(arr_datums[i]);
+ params[j++] = (char *) pg_do_encoding_conversion((unsigned char *) cstr,
+ strlen(cstr),
+ GetDatabaseEncoding(),
+ PG_UTF8);
+ }
+ params[j] = NULL;
+ }
+
+ result = xslt_process_internal(doct, ssheet, params);
+
+ PG_RETURN_XML_P(result);
+
+#else /* !USE_LIBXSLT */
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("xslt_process() is not available without libxslt")));
+ PG_RETURN_NULL();
+
+#endif /* USE_LIBXSLT */
+}
+
+#ifdef USE_LIBXSLT
+
+static xmltype *
+xslt_process_internal(xmltype *doct, xmltype *ssheet, const char **params)
+{
+ text *volatile result;
PgXmlErrorContext *xmlerrcxt;
volatile xsltStylesheetPtr stylesheet = NULL;
volatile xmlDocPtr doctree = NULL;
@@ -64,18 +170,6 @@ xslt_process(PG_FUNCTION_ARGS)
xmlGenericErrorFunc saved_errfunc;
void *saved_errcxt;
- if (fcinfo->nargs == 3)
- {
- paramstr = PG_GETARG_TEXT_PP(2);
- params = parse_params(paramstr);
- }
- else
- {
- /* No parameters */
- params = (const char **) palloc(sizeof(char *));
- params[0] = NULL;
- }
-
/* Setup parser */
xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
@@ -93,21 +187,18 @@ xslt_process(PG_FUNCTION_ARGS)
int reslen = 0;
/*
- * Parse document. It's important to set an "URL", so libxslt includes
- * line numbers in error messages (cf. xsltPrintErrorContext()).
+ * Parse document.
*/
- doctree = xmlReadMemory((char *) VARDATA_ANY(doct),
- VARSIZE_ANY_EXHDR(doct), "SQL", NULL,
- XML_PARSE_NOENT);
+ doctree = xml_parse(doct, XMLOPTION_DOCUMENT, true,
+ GetDatabaseEncoding(), NULL, NULL, NULL);
if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"error parsing XML document");
/* Same for stylesheet */
- ssdoc = xmlReadMemory((char *) VARDATA_ANY(ssheet),
- VARSIZE_ANY_EXHDR(ssheet), "SQL", NULL,
- XML_PARSE_NOENT);
+ ssdoc = xml_parse(ssheet, XMLOPTION_DOCUMENT, true,
+ GetDatabaseEncoding(), NULL, NULL, NULL);
if (ssdoc == NULL || pg_xml_error_occurred(xmlerrcxt))
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
@@ -198,18 +289,9 @@ xslt_process(PG_FUNCTION_ARGS)
xsltSetGenericErrorFunc(saved_errcxt, saved_errfunc);
pg_xml_done(xmlerrcxt, false);
- PG_RETURN_TEXT_P(result);
-#else /* !USE_LIBXSLT */
-
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("xslt_process() is not available without libxslt")));
- PG_RETURN_NULL();
-#endif /* USE_LIBXSLT */
+ return result;
}
-#ifdef USE_LIBXSLT
-
static const char **
parse_params(text *paramstr)
{
diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml
index 9fd613f967..dc6fb40121 100644
--- a/doc/src/sgml/xml2.sgml
+++ b/doc/src/sgml/xml2.sgml
@@ -408,22 +408,29 @@ ORDER BY doc_num, line_num;
</indexterm>
<synopsis>
-xslt_process(text document, text stylesheet, text paramlist) returns text
+xslt_process(xml document, xml stylesheet, text[] paramlist) returns xml
</synopsis>
<para>
This function applies the XSL stylesheet to the document and returns
- the transformed result. The <literal>paramlist</literal> is a list of parameter
- assignments to be used in the transformation, specified in the form
- <literal>a=1,b=2</literal>. Note that the
- parameter parsing is very simple-minded: parameter values cannot
- contain commas!
+ the transformed result. The <literal>paramlist</literal> is an array of parameter
+ assignments to be used in the transformation, specified in pairs of
+ key and value strings (e.g. <literal>ARRAY['a','1', 'b','2']</literal>).
+ The length of the array must be even.
+ Note that the values are still interpreted as XPath expressions, so string values need to
+ be quoted in single or double quotes (e.g. <literal>ARRAY['a','"string"']</literal>).
</para>
<para>
There is also a two-parameter version of <function>xslt_process</function> which
does not pass any parameters to the transformation.
</para>
+
+ <para>
+ <emphasis>Deprecated</emphasis> variants of <function>xslt_process</function> accepting
+ text arguments and parameters encoded into single text strings
+ (e.g. <literal>a=1,b=2</literal>) are also still available.
+ </para>
</sect3>
</sect2>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index f54828fb99..3b8ab29555 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -155,11 +155,11 @@ static int parse_xml_decl(const xmlChar *str, size_t *lenp,
static bool print_xml_decl(StringInfo buf, const xmlChar *version,
pg_enc encoding, int standalone);
static bool xml_doctype_in_content(const xmlChar *str);
-static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
- bool preserve_whitespace, int encoding,
- XmlOptionType *parsed_xmloptiontype,
- xmlNodePtr *parsed_nodes,
- Node *escontext);
+xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
+ bool preserve_whitespace, int encoding,
+ XmlOptionType *parsed_xmloptiontype,
+ xmlNodePtr *parsed_nodes,
+ Node *escontext);
static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
ArrayBuildState *astate,
@@ -1783,7 +1783,7 @@ xml_doctype_in_content(const xmlChar *str)
* TODO maybe libxml2's xmlreader is better? (do not construct DOM,
* yet do not use SAX - see xmlreader.c)
*/
-static xmlDocPtr
+xmlDocPtr
xml_parse(text *data, XmlOptionType xmloption_arg,
bool preserve_whitespace, int encoding,
XmlOptionType *parsed_xmloptiontype, xmlNodePtr *parsed_nodes,
@@ -1879,8 +1879,13 @@ xml_parse(text *data, XmlOptionType xmloption_arg,
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate parser context");
+ /*
+ * Setting a dummy "SQL" URL is important for the
+ * xsltPrintErrorContext() when using the legacy text-based
+ * xslt_process() variant.
+ */
doc = xmlCtxtReadDoc(ctxt, utf8string,
- NULL, /* no URL */
+ "SQL",
"UTF-8",
options);
--
2.49.0