Another issue with invalid XML values

Started by Florian Pflugover 14 years ago46 messages
#1Florian Pflug
fgp@phlo.org

Hi

Unfortunately, I found another way to produce invalid XML values.

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as xmlns))))[1];
xpath
-------------------
<root xmlns="<"/>

Since a literal "<" is not allowed in XML attributes, this XML value is not well-formed. And indeed

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as xmlns))))[1]::TEXT::XML;
ERROR: invalid XML content
DETAIL: Entity: line 1: parser error : Unescaped '<' not allowed in attributes values

Note that this only affects namespace declarations (xmlns). The following case works correctly

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as value))))[1];
xpath
----------------------
<root value="&lt;"/>

The root of this issue is that "<" isn't a valid namespace URI to begin with, since "<" isn't in the set of allowed characters for URIs. Thus, when converting an XML node back to text, libxml doesn't escape xmlns attribute values.

I don't have a good solution for this issue yet. Special-casing attributes called "xmlns" (or "xmlns:<prefix>") in XMLATTRIBUTES() solves only part of the problem - the TEXT to XML cast is similarly lenient and doesn't complain if you do '<root xmlns="&lt;"/>'::XML.

Why this cast succeeds is somewhat beyond me though - piping the very same XML document into xmllint produces

$ echo '<root xmlns="&lt;"/>' | xmllint -
-:1: namespace error : xmlns: '<' is not a valid URI

My nagging suspicion is that libxml reports errors like there via some callback function, and only returns a non-zero result if there are structural errors in the XML. But my experience with libxml is pretty limited, so maybe someone with more experience in this area can shed some light on this...

best regards,
Florian Pflug

#2Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#1)
Re: Another issue with invalid XML values

On Jun1, 2011, at 03:17 , Florian Pflug wrote:

My nagging suspicion is that libxml reports errors like there via some callback function, and only returns a non-zero result if there are structural errors in the XML. But my experience with libxml is pretty limited, so maybe someone with more experience in this area can shed some light on this...

As it turns out, this is actually the case.

libxml reports some errors (like invalid xmlns attributes) via the error handler set using xmlSetGenericErrorFunc() but still returns zero (indicating success) from xmlCtxtReadDoc() and xmlParseBalancedChunkMemory().

If I modify xml_parse() to complain not only if one of these functions return non-zero, but also if xml_err_buf has non-zero length, invalid xmlns attributes are reported correctly.

However, the error function set using xmlSetGenericErrorFunc() cannot distinguish between error and warnings, so doing this causes XMLPARSE() to also complain about things like non-absolute namespace URIs (which are allowed but deprecated as far as I understand).

To fix that, xmlSetGenericErrorFunc() would probably have to be replace by xmlSetStructuredErrorFunc(). Structured error functions receive a pointer to an xmlError structore which, amongst other things, contains an xmlErrorLevel (NONE, WARNING, ERROR, FATAL).

While digging through the code in src/backend/utils/adt/xml.c, I also noticed that we set a global error handler instead of a per-context one. I guess this is because xmlParseBalancedChunkMemory(), which we use to parse XML fragments, doesn't provide a way to pass in a context but rather creates it itself. Still, I wonder if there isn't some other API which we could use which does allow us to specify a context. Again, it'd be nice if someone more familiar with this code could explain the reasons behind the current design.

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

best regards,
Florian Pflug

#3Noah Misch
noah@leadboat.com
In reply to: Florian Pflug (#2)
Re: Another issue with invalid XML values

On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:

On Jun1, 2011, at 03:17 , Florian Pflug wrote:

My nagging suspicion is that libxml reports errors like there via some callback function, and only returns a non-zero result if there are structural errors in the XML. But my experience with libxml is pretty limited, so maybe someone with more experience in this area can shed some light on this...

As it turns out, this is actually the case.

Thanks for getting to the bottom of this. I had wondered why, for some versions
of libxml2, xml_in would accept things that xmllint rejected. Sounds like the
list of errors that actually affect the return code has changed over time.

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

Sounds sensible. Will this impose any new libxml2 version dependency?

#4Florian Pflug
fgp@phlo.org
In reply to: Noah Misch (#3)
Re: Another issue with invalid XML values

On Jun2, 2011, at 00:02 , Noah Misch wrote:

On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

Sounds sensible. Will this impose any new libxml2 version dependency?

xmlSetStructuredErrorFunc() seems to be available starting with libxml 2.6.0,
release on Oct 20, 2003. Since we already require the version to be >= 2.6.23,
we should be OK.

I won't have access to my PC the next few days, but I'll try to come up with
a patch some time next week.

best regards,
Florian Pflug

#5Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#4)
1 attachment(s)
Re: Another issue with invalid XML values

On Jun2, 2011, at 01:34 , Florian Pflug wrote:

On Jun2, 2011, at 00:02 , Noah Misch wrote:

On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

Sounds sensible. Will this impose any new libxml2 version dependency?

xmlSetStructuredErrorFunc() seems to be available starting with libxml 2.6.0,
release on Oct 20, 2003. Since we already require the version to be >= 2.6.23,
we should be OK.

I won't have access to my PC the next few days, but I'll try to come up with
a patch some time next week.

Phew... I did manage to produce a patch, but it was way more work than I
had intended to put into this.

As it turns out, you loose the nicely formatted context information that
libxml2 provides via the generic error func once you switch to structured
error reporting. Registering handlers for both doesn't help either, since
the generic error handler isn't called once you register a structured one.

Fortunately, libxml does export xmlParserPrintFileContext() which
generates these context messages. It, however, doesn't return a string,
but instead passes them to the generic error handler (this time, independent
from whether a structural error handler is registered or not).

As it stood, the code assumed that all third-party library re-install
their libxml error handlers before each library call, and thus didn't
bother to restore the old error handler itself. Since I revamped the
error handling anyway, I removed that requirement. There is now a
function pg_xml_done() which restores the original error handler that
we overwrote in pg_xml_init().

I also realized that some libxml error (like undefined namespace prefixes)
must be ignored during xmlparse() and friends. Otherwise, it becomes
impossible to build XML documents from individual fragments. pg_xml_init()
therefore now takes an argument which specifies how strict the error
checking is supposed to be. For the moment, only XPATH() uses the strict
mode in which we report all errors. XMLPARSE() and friends only report
parse errors, not namespace errors.

Finally, I had to adjust contrib/xml2 because it uses some parts of
the core XML support like pg_xml_init().

Heres the indended behaviour with the patch applied:
----------------------------------------------------

We always use structured error handling. For now, the error messages
pretty much resemble the old ones, but it's now easy to add additional
information.

XMLPARSE() and casting to XML check for parse errors only, like they do
without the patch. They're also capable of reporting warnings, but I
didn't find a case where the libxml parser generates a warning.

XPATH() reports all errors and warnings. Trying to use XPATH() on
a document with e.g. inconsistent namespace usage or invalid
namespace URIs therefore now raises an error. This is *necessary*
because libxml's XPath evaluator gets confused if it encounters
e.g. invalid namespace URI and outputs invalid XML in response.

contrib/xml2's behaviour hasn't changed.

Patch is attached, and comments are welcome.

best regards,
Florian Pflug

Attachments:

pg_xml_errorhandling.v1.patchapplication/octet-stream; name=pg_xml_errorhandling.v1.patchDownload
diff --git a/contrib/xml2/xpath.c b/contrib/xml2/xpath.c
index 44c600e..556b5a1 100644
*** a/contrib/xml2/xpath.c
--- b/contrib/xml2/xpath.c
*************** Datum		xpath_table(PG_FUNCTION_ARGS);
*** 38,44 ****
  
  /* exported for use by xslt_proc.c */
  
! void		pgxml_parser_init(void);
  
  /* workspace for pgxml_xpath() */
  
--- 38,44 ----
  
  /* exported for use by xslt_proc.c */
  
! void		pgxml_parser_init(XmlPurposeType purpose);
  
  /* workspace for pgxml_xpath() */
  
*************** static void cleanup_workspace(xpath_work
*** 70,79 ****
   * Initialize for xml parsing.
   */
  void
! pgxml_parser_init(void)
  {
  	/* Set up error handling (we share the core's error handler) */
! 	pg_xml_init();
  
  	/* Initialize libxml */
  	xmlInitParser();
--- 70,79 ----
   * Initialize for xml parsing.
   */
  void
! pgxml_parser_init(XmlPurposeType purpose)
  {
  	/* Set up error handling (we share the core's error handler) */
! 	pg_xml_init(purpose);
  
  	/* Initialize libxml */
  	xmlInitParser();
*************** xml_is_well_formed(PG_FUNCTION_ARGS)
*** 100,113 ****
  	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
  	int32		docsize = VARSIZE(t) - VARHDRSZ;
  	xmlDocPtr	doctree;
  
! 	pgxml_parser_init();
  
  	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
! 	if (doctree == NULL)
! 		PG_RETURN_BOOL(false);	/* i.e. not well-formed */
  	xmlFreeDoc(doctree);
! 	PG_RETURN_BOOL(true);
  }
  
  
--- 100,116 ----
  	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
  	int32		docsize = VARSIZE(t) - VARHDRSZ;
  	xmlDocPtr	doctree;
+ 	bool		result;
  
! 	pgxml_parser_init(XML_PURPOSE_LEGACY);
  
  	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
! 	result = (doctree != NULL);
  	xmlFreeDoc(doctree);
! 	
! 	pg_xml_done();
! 	
! 	PG_RETURN_BOOL(result);
  }
  
  
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 406,412 ****
  	workspace->ctxt = NULL;
  	workspace->res = NULL;
  
! 	pgxml_parser_init();
  
  	workspace->doctree = xmlParseMemory((char *) VARDATA(document), docsize);
  	if (workspace->doctree == NULL)
--- 409,415 ----
  	workspace->ctxt = NULL;
  	workspace->res = NULL;
  
! 	pgxml_parser_init(XML_PURPOSE_LEGACY);
  
  	workspace->doctree = xmlParseMemory((char *) VARDATA(document), docsize);
  	if (workspace->doctree == NULL)
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 420,425 ****
--- 423,429 ----
  	if (comppath == NULL)
  	{
  		cleanup_workspace(workspace);
+ 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"XPath Syntax Error");
  	}
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 433,438 ****
--- 437,444 ----
  	if (res == NULL)
  		cleanup_workspace(workspace);
  
+ 	pg_xml_done();
+ 
  	return res;
  }
  
*************** xpath_table(PG_FUNCTION_ARGS)
*** 659,665 ****
  	 * Setup the parser.  This should happen after we are done evaluating the
  	 * query, in case it calls functions that set up libxml differently.
  	 */
! 	pgxml_parser_init();
  
  	/* For each row i.e. document returned from SPI */
  	for (i = 0; i < proc; i++)
--- 665,671 ----
  	 * Setup the parser.  This should happen after we are done evaluating the
  	 * query, in case it calls functions that set up libxml differently.
  	 */
! 	pgxml_parser_init(XML_PURPOSE_LEGACY);
  
  	/* For each row i.e. document returned from SPI */
  	for (i = 0; i < proc; i++)
*************** xpath_table(PG_FUNCTION_ARGS)
*** 720,725 ****
--- 726,732 ----
  					if (comppath == NULL)
  					{
  						xmlFreeDoc(doctree);
+ 						pg_xml_done();
  						xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  									"XPath Syntax Error");
  					}
*************** xpath_table(PG_FUNCTION_ARGS)
*** 784,789 ****
--- 791,798 ----
  			pfree(xmldoc);
  	}
  
+ 	pg_xml_done();
+ 
  	tuplestore_donestoring(tupstore);
  
  	SPI_finish();
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index f8f7d72..b71cafe 100644
*** a/contrib/xml2/xslt_proc.c
--- b/contrib/xml2/xslt_proc.c
*************** Datum		xslt_process(PG_FUNCTION_ARGS);
*** 38,44 ****
  #ifdef USE_LIBXSLT
  
  /* declarations to come from xpath.c */
! extern void pgxml_parser_init(void);
  
  /* local defs */
  static const char **parse_params(text *paramstr);
--- 38,44 ----
  #ifdef USE_LIBXSLT
  
  /* declarations to come from xpath.c */
! extern void pgxml_parser_init(XmlPurposeType purpose);
  
  /* local defs */
  static const char **parse_params(text *paramstr);
*************** xslt_process(PG_FUNCTION_ARGS)
*** 77,83 ****
  	}
  
  	/* Setup parser */
! 	pgxml_parser_init();
  
  	/* Check to see if document is a file or a literal */
  
--- 77,83 ----
  	}
  
  	/* Setup parser */
! 	pgxml_parser_init(XML_PURPOSE_LEGACY);
  
  	/* Check to see if document is a file or a literal */
  
*************** xslt_process(PG_FUNCTION_ARGS)
*** 86,94 ****
  	else
  		doctree = xmlParseFile(text_to_cstring(doct));
  
! 	if (doctree == NULL)
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"error parsing XML document");
  
  	/* Same for stylesheet */
  	if (VARDATA(ssheet)[0] == '<')
--- 86,96 ----
  	else
  		doctree = xmlParseFile(text_to_cstring(doct));
  
! 	if (doctree == NULL) {
! 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"error parsing XML document");
+ 	}
  
  	/* Same for stylesheet */
  	if (VARDATA(ssheet)[0] == '<')
*************** xslt_process(PG_FUNCTION_ARGS)
*** 98,103 ****
--- 100,106 ----
  		if (ssdoc == NULL)
  		{
  			xmlFreeDoc(doctree);
+ 			pg_xml_done();
  			xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  						"error parsing stylesheet as XML document");
  		}
*************** xslt_process(PG_FUNCTION_ARGS)
*** 112,117 ****
--- 115,121 ----
  	{
  		xmlFreeDoc(doctree);
  		xsltCleanupGlobals();
+ 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"failed to parse stylesheet");
  	}
*************** xslt_process(PG_FUNCTION_ARGS)
*** 125,130 ****
--- 129,136 ----
  
  	xsltCleanupGlobals();
  
+ 	pg_xml_done();
+ 
  	if (resstat < 0)
  		PG_RETURN_NULL();
  
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 702b9e3..3922bb9 100644
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***************
*** 82,92 ****
  int			xmlbinary;
  int			xmloption;
  
  #ifdef USE_LIBXML
  
! static StringInfo xml_err_buf = NULL;
  
! static void xml_errorHandler(void *ctxt, const char *msg,...);
  static void xml_ereport_by_code(int level, int sqlcode,
  					const char *msg, int errcode);
  
--- 82,102 ----
  int			xmlbinary;
  int			xmloption;
  
+ /* Flag indicating whether libxml reported an error */
+ bool xml_error_occured = false;
+ 
  #ifdef USE_LIBXML
  
! static bool xml_error_initialized = false;
! static xmlStructuredErrorFunc xml_structuredErrorFunc_saved = NULL;
! static void* xml_structuredErrorContext_saved = NULL;
  
! static XmlPurposeType xml_purpose = XML_PURPOSE_NONE;
! static StringInfo xml_error_detail_buf = NULL;
! 
! static void xml_errorHandler(void* data, xmlErrorPtr error);
! static void appendStringInfoLineSeparator(StringInfo str);
! static void chopStringInfoNewlines(StringInfo str);
  static void xml_ereport_by_code(int level, int sqlcode,
  					const char *msg, int errcode);
  
*************** xmlelement(XmlExprState *xmlExpr, ExprCo
*** 597,614 ****
  	}
  
  	/* now safe to run libxml */
! 	pg_xml_init();
  
  	PG_TRY();
  	{
  		buf = xmlBufferCreate();
! 		if (!buf)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate xmlBuffer");
  		writer = xmlNewTextWriterMemory(buf, 0);
! 		if (!writer)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate xmlTextWriter");
  
  		xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
  
--- 607,622 ----
  	}
  
  	/* now safe to run libxml */
! 	pg_xml_init(XML_PURPOSE_OTHER);
  
  	PG_TRY();
  	{
  		buf = xmlBufferCreate();
! 		XML_REPORT_ERROR(buf != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 						 "could not allocate xmlBuffer");
  		writer = xmlNewTextWriterMemory(buf, 0);
! 		XML_REPORT_ERROR(writer != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 						 "could not allocate xmlTextWriter");
  
  		xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
  
*************** xmlelement(XmlExprState *xmlExpr, ExprCo
*** 644,655 ****
--- 652,668 ----
  			xmlFreeTextWriter(writer);
  		if (buf)
  			xmlBufferFree(buf);
+ 			
+ 		pg_xml_done();
+ 		
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlBufferFree(buf);
  
+ 	pg_xml_done();
+ 
  	return result;
  #else
  	NO_XML_SUPPORT();
*************** xml_is_document(xmltype *arg)
*** 848,856 ****
   * This should be called by each function that is about to use libxml
   * facilities.	It has two responsibilities: verify compatibility with the
   * loaded libxml version (done on first call in a session) and establish
!  * or re-establish our libxml error handler.  The latter needs to be done
!  * anytime we might have passed control to add-on modules (eg libperl) which
!  * might have set their own error handler for libxml.
   *
   * This is exported for use by contrib/xml2, as well as other code that might
   * wish to share use of this module's libxml error handler.
--- 861,871 ----
   * This should be called by each function that is about to use libxml
   * facilities.	It has two responsibilities: verify compatibility with the
   * loaded libxml version (done on first call in a session) and establish
!  * our libxml error handler. The purpose determines which errors are
!  * reported and which are ignored. If the purpose is XML_PURPOSE_NONE,
!  * error handling is skipped entirely. If the purpose is any other value,
!  * pg_xml_done() *must* be called after the caller is done using libxml
!  * to restore the original error handler.
   *
   * This is exported for use by contrib/xml2, as well as other code that might
   * wish to share use of this module's libxml error handler.
*************** xml_is_document(xmltype *arg)
*** 859,868 ****
   * check)
   */
  void
! pg_xml_init(void)
  {
  	static bool first_time = true;
! 
  	if (first_time)
  	{
  		/* Stuff we need do only once per session */
--- 874,883 ----
   * check)
   */
  void
! pg_xml_init(XmlPurposeType purpose)
  {
  	static bool first_time = true;
! 	
  	if (first_time)
  	{
  		/* Stuff we need do only once per session */
*************** pg_xml_init(void)
*** 880,890 ****
  
  		/* create error buffer in permanent context */
  		oldcontext = MemoryContextSwitchTo(TopMemoryContext);
! 		xml_err_buf = makeStringInfo();
  		MemoryContextSwitchTo(oldcontext);
! 
! 		/* Now that xml_err_buf exists, safe to call xml_errorHandler */
! 		xmlSetGenericErrorFunc(NULL, xml_errorHandler);
  
  #ifdef USE_LIBXMLCONTEXT
  		/* Set up memory allocation our way, too */
--- 895,903 ----
  
  		/* create error buffer in permanent context */
  		oldcontext = MemoryContextSwitchTo(TopMemoryContext);
! 		xml_error_detail_buf = makeStringInfo();
  		MemoryContextSwitchTo(oldcontext);
! 		xml_error_occured = false;
  
  #ifdef USE_LIBXMLCONTEXT
  		/* Set up memory allocation our way, too */
*************** pg_xml_init(void)
*** 896,916 ****
  
  		first_time = false;
  	}
! 	else
! 	{
! 		/* Reset pre-existing buffer to empty */
! 		Assert(xml_err_buf != NULL);
! 		resetStringInfo(xml_err_buf);
  
! 		/*
! 		 * We re-establish the error callback function every time.	This makes
! 		 * it safe for other subsystems (PL/Perl, say) to also use libxml with
! 		 * their own callbacks ... so long as they likewise set up the
! 		 * callbacks on every use. It's cheap enough to not be worth worrying
! 		 * about, anyway.
! 		 */
! 		xmlSetGenericErrorFunc(NULL, xml_errorHandler);
  	}
  }
  
  
--- 909,983 ----
  
  		first_time = false;
  	}
! 	
! 	/* Purpose XML_PURPOSE_NONE tells us to only setup libxml, not
! 	 * the error handler. pg_xml_done() must *not* be called in this
! 	 * case!
! 	 */
! 	if (purpose == XML_PURPOSE_NONE)
! 		return;
! 		
! 	/* Imbalanced calls of pg_xml_init() and pg_xml_done() */
! 	Assert(!xml_error_initialized);	
! 	Assert(xml_purpose == XML_PURPOSE_NONE);
! 	
! 	/* Set purpose */
! 	xml_purpose = purpose;
! 		
! 	/* Reset pre-existing buffer to empty */
! 	Assert(xml_error_detail_buf != NULL);
! 	resetStringInfo(xml_error_detail_buf);
! 	xml_error_occured = false;
! 	
! 	/* Save original error handler and install ours */
! 	xml_structuredErrorFunc_saved = xmlStructuredError;
! 	xml_structuredErrorContext_saved = xmlStructuredErrorContext;
! 	xmlSetStructuredErrorFunc(NULL, xml_errorHandler);
  
! 	xml_error_initialized = true;
! }
! 
! 
! /*
!  * pg_xml_done --- restore libxml state after pg_xml_init().
!  * 
!  * This must be called if pg_xml_init() was called with a
!  * purpose other than XML_PURPOSE_NONE. Resets libxml's global state
!  * (i.e. the structured error handler) to the original state.
!  *
!  * It is OK to call xml_ereport() after pg_xml_done() - 
!  * pg_xml_done() leaves xml_error_detail_buf as it is.
!  */
! void
! pg_xml_done(void)
! {
! 	/* Imbalanced calls of pg_xml_init() and pg_xml_done(). */
! 	Assert(xml_error_initialized);
! 	Assert(xml_purpose != XML_PURPOSE_NONE);
! 	
! 	xmlSetStructuredErrorFunc(xml_structuredErrorContext_saved,
! 							  xml_structuredErrorFunc_saved);
! 	xml_purpose = XML_PURPOSE_NONE;
! 	xml_error_initialized = false;
! }
! 
! 
! 
! /*
!  * pg_xml_erroroccurred() --- Test and reset the error flag.
!  *
!  * Returns true if an libxml error occurred after the last call of
!  * this function.
!  */
! bool
! pg_xml_erroroccurred(void)
! {
! 	if (xml_error_occured) {
! 		xml_error_occured = false;
! 		return true;
  	}
+ 	
+ 	return false;
  }
  
  
*************** parse_xml_decl(const xmlChar *str, size_
*** 969,975 ****
  	int			utf8char;
  	int			utf8len;
  
! 	pg_xml_init();
  
  	/* Initialize output arguments to "not present" */
  	if (version)
--- 1036,1046 ----
  	int			utf8char;
  	int			utf8len;
  
! 	/* Only initialize libxml. We don't need error handling here,
! 	 * but we do need to make sure libxml is initialized before
! 	 * calling any of its functions
! 	 */
! 	pg_xml_init(XML_PURPOSE_NONE);
  
  	/* Initialize output arguments to "not present" */
  	if (version)
*************** static bool
*** 1123,1130 ****
  print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone)
  {
- 	pg_xml_init();				/* why is this here? */
- 
  	if ((version && strcmp((char *) version, PG_XML_DEFAULT_VERSION) != 0)
  		|| (encoding && encoding != PG_UTF8)
  		|| standalone != -1)
--- 1194,1199 ----
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1175,1182 ****
  	int32		len;
  	xmlChar    *string;
  	xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt;
! 	xmlDocPtr	doc;
  
  	len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
  	string = xml_text2xmlChar(data);
--- 1244,1251 ----
  	int32		len;
  	xmlChar    *string;
  	xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt = NULL;
! 	xmlDocPtr	doc = NULL;
  
  	len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
  	string = xml_text2xmlChar(data);
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1187,1203 ****
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
! 	pg_xml_init();
  	xmlInitParser();
  
- 	ctxt = xmlNewParserCtxt();
- 	if (ctxt == NULL)
- 		xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
- 					"could not allocate parser context");
- 
  	/* Use a TRY block to ensure the ctxt is released */
  	PG_TRY();
  	{
  		if (xmloption_arg == XMLOPTION_DOCUMENT)
  		{
  			/*
--- 1256,1271 ----
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
! 	pg_xml_init(XML_PURPOSE_WELLFORMED);
  	xmlInitParser();
  
  	/* Use a TRY block to ensure the ctxt is released */
  	PG_TRY();
  	{
+ 		ctxt = xmlNewParserCtxt();
+ 		XML_REPORT_ERROR(ctxt != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 						 "could not allocate parser context");
+ 
  		if (xmloption_arg == XMLOPTION_DOCUMENT)
  		{
  			/*
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1212,1220 ****
  								 "UTF-8",
  								 XML_PARSE_NOENT | XML_PARSE_DTDATTR
  						   | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
! 			if (doc == NULL)
! 				xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 							"invalid XML document");
  		}
  		else
  		{
--- 1280,1287 ----
  								 "UTF-8",
  								 XML_PARSE_NOENT | XML_PARSE_DTDATTR
  						   | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
! 			XML_REPORT_ERROR(doc != NULL, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 							 "invalid XML document");
  		}
  		else
  		{
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1237,1259 ****
  
  			res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
  												   utf8string + count, NULL);
! 			if (res_code != 0)
! 			{
! 				xmlFreeDoc(doc);
! 				xml_ereport(ERROR, ERRCODE_INVALID_XML_CONTENT,
! 							"invalid XML content");
! 			}
  		}
  	}
  	PG_CATCH();
  	{
! 		xmlFreeParserCtxt(ctxt);
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlFreeParserCtxt(ctxt);
! 
  	return doc;
  }
  
--- 1304,1330 ----
  
  			res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
  												   utf8string + count, NULL);
! 			XML_REPORT_ERROR(res_code == 0, ERROR, ERRCODE_INVALID_XML_CONTENT,
! 							 "invalid XML content");
  		}
  	}
  	PG_CATCH();
  	{
! 		if (doc != NULL)
! 			xmlFreeDoc(doc);
! 		if (ctxt != NULL)
! 			xmlFreeParserCtxt(ctxt);
! 		
! 		pg_xml_done();
! 		
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlFreeParserCtxt(ctxt);
! 	
! 	pg_xml_done();
! 	
  	return doc;
  }
  
*************** void
*** 1337,1366 ****
  xml_ereport(int level, int sqlcode, const char *msg)
  {
  	char	   *detail;
! 
  	/*
! 	 * It might seem that we should just pass xml_err_buf->data directly to
! 	 * errdetail.  However, we want to clean out xml_err_buf before throwing
  	 * error, in case there is another function using libxml further down the
  	 * call stack.
  	 */
! 	if (xml_err_buf->len > 0)
  	{
! 		detail = pstrdup(xml_err_buf->data);
! 		resetStringInfo(xml_err_buf);
  	}
  	else
  		detail = NULL;
  
  	if (detail)
  	{
- 		size_t		len;
- 
- 		/* libxml error messages end in '\n'; get rid of it */
- 		len = strlen(detail);
- 		if (len > 0 && detail[len - 1] == '\n')
- 			detail[len - 1] = '\0';
- 
  		ereport(level,
  				(errcode(sqlcode),
  				 errmsg("%s", msg),
--- 1408,1430 ----
  xml_ereport(int level, int sqlcode, const char *msg)
  {
  	char	   *detail;
! 	
  	/*
! 	 * It might seem that we should just pass xml_error_detail_buf->data directly to
! 	 * errdetail.  However, we want to clean out xml_error_detail_buf before throwing
  	 * error, in case there is another function using libxml further down the
  	 * call stack.
  	 */
! 	if (xml_error_detail_buf->len > 0)
  	{
! 		detail = pstrdup(xml_error_detail_buf->data);
! 		resetStringInfo(xml_error_detail_buf);
  	}
  	else
  		detail = NULL;
  
  	if (detail)
  	{
  		ereport(level,
  				(errcode(sqlcode),
  				 errmsg("%s", msg),
*************** xml_ereport(int level, int sqlcode, cons
*** 1376,1402 ****
  
  
  /*
!  * Error handler for libxml error messages
   */
  static void
! xml_errorHandler(void *ctxt, const char *msg,...)
  {
! 	/* Append the formatted text to xml_err_buf */
! 	for (;;)
  	{
! 		va_list		args;
! 		bool		success;
  
- 		/* Try to format the data. */
- 		va_start(args, msg);
- 		success = appendStringInfoVA(xml_err_buf, msg, args);
- 		va_end(args);
  
! 		if (success)
  			break;
! 
! 		/* Double the buffer size and try again. */
! 		enlargeStringInfo(xml_err_buf, xml_err_buf->maxlen);
  	}
  }
  
--- 1440,1566 ----
  
  
  /*
!  * Append a newline after removing all existing trailing newlines
   */
  static void
! appendStringInfoLineSeparator(StringInfo str)
  {
! 	chopStringInfoNewlines(str);
! 	
! 	if (str->len > 0)
! 		appendStringInfoChar(str, '\n');
! }
! 
! 
! /*
!  * Remove all trailing newlines
!  */
! static void
! chopStringInfoNewlines(StringInfo str)
! {
! 	while ((str->len > 0) &&
! 		   (str->data[str->len-1] == '\n'))
  	{
! 		str->data[--str->len] = '\0';
! 	}
! }
  
  
! /*
!  * Error handler for libxml errors and warnings
!  */
! static void
! xml_errorHandler(void* data, xmlErrorPtr error)
! {
! 	/* Buffer to hold the error detail */
! 	StringInfo errorBuf = makeStringInfo();
! 	
! 	/* Get parser and input context */
! 	xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr) error->ctxt;
! 	xmlParserInputPtr input = (ctxt != NULL) ? ctxt->input : NULL;
! 	xmlNodePtr node = error->node;
! 	const xmlChar* name = ((node != NULL) && (node->type == XML_ELEMENT_NODE)) ?
! 					node->name : NULL;
! 	
! 	switch (error->domain) {
! 		case XML_FROM_NONE:
! 		case XML_FROM_PARSER:
! 		case XML_FROM_MEMORY:
! 		case XML_FROM_IO:
! 			/* Accept regardless of the parsing purpose */
! 			break;
! 		
! 		default:
! 			/* Ignore during well-formedness check */
! 			if (xml_purpose == XML_PURPOSE_WELLFORMED)
! 				return;
  			break;
! 	}
! 	
! 	/* Append error message to xml_error_detail_buf */
! 	if ((error->domain == XML_FROM_PARSER) && (error->line > 0))
! 		appendStringInfo(errorBuf, "line %d: ", error->line);
! 	if (name != NULL)
! 		appendStringInfo(errorBuf, "element %s: ", name);
! 	appendStringInfo(errorBuf, "%s", error->message);
! 		
! 	/* Append context information to xml_error_detail_buf.
! 	 * xmlParserPrintFileContext() uses the *generic* error handle to 
! 	 * write the context. Since we don't want to duplicate libxml
! 	 * functionality here, we setup a generic error handler temporarily
! 	 */
! 	if (input != NULL)
! 	{
! 		/* Save generic error func and setup the xml_error_detail_buf
! 		 * appender as generic error func. This should work because
! 		 * appendStringInfo() has essentially the same signature as
! 		 * xmlGenericErrorFunc().
! 		 */
! 		xmlGenericErrorFunc errFuncSaved = xmlGenericError;
! 		void* errCtxSaved = xmlGenericErrorContext;
! 		xmlSetGenericErrorFunc(errorBuf, (xmlGenericErrorFunc)appendStringInfo);
! 		
! 		/* Generic context information */
! 		appendStringInfoLineSeparator(errorBuf);
! 		xmlParserPrintFileContext(input);
! 		
! 		/* Restore generic error func */
! 		xmlSetGenericErrorFunc(errCtxSaved, errFuncSaved);
! 	}
! 	
! 	chopStringInfoNewlines(errorBuf);
! 	
! 	/* Legacy error handling. The error flag is never set. Exists because
! 	 * the xml2 contrib module uses our error-handling infrastructure, but
! 	 * we don't want to change its behaviour since it's deprecated anyway
! 	 */
! 	if (xml_purpose == XML_PURPOSE_LEGACY)
! 	{
! 		appendStringInfoLineSeparator(xml_error_detail_buf);
! 		appendStringInfoString(xml_error_detail_buf, errorBuf->data);
! 		return;
! 	}
! 	
! 	/* We don't want to ereport() here because that'd probably leave
! 	 * libxml in an inconsistent state. Instead, we remember the
! 	 * error and ereport() from xml_ereport().
! 	 *
! 	 * Warnings and notices are reported immediatly since they don't cause a
! 	 * longjmp() out of libxml.
! 	 */
! 	if (error->level >= XML_ERR_ERROR)
! 	{
! 		appendStringInfoLineSeparator(xml_error_detail_buf);
! 		appendStringInfoString(xml_error_detail_buf, errorBuf->data);		
! 		xml_error_occured = true;
! 	}
! 	else if (error->level >= XML_ERR_WARNING)
! 	{
! 		ereport(WARNING, (errmsg("%s", errorBuf->data)));
! 	}
! 	else
! 	{
! 		ereport(NOTICE, (errmsg("%s", errorBuf->data)));
  	}
  }
  
*************** map_sql_value_to_xml_value(Datum value, 
*** 1756,1773 ****
  					xmlTextWriterPtr writer = NULL;
  					char	   *result;
  
! 					pg_xml_init();
  
  					PG_TRY();
  					{
  						buf = xmlBufferCreate();
! 						if (!buf)
! 							xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 										"could not allocate xmlBuffer");
  						writer = xmlNewTextWriterMemory(buf, 0);
! 						if (!writer)
! 							xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 										"could not allocate xmlTextWriter");
  
  						if (xmlbinary == XMLBINARY_BASE64)
  							xmlTextWriterWriteBase64(writer, VARDATA_ANY(bstr),
--- 1920,1935 ----
  					xmlTextWriterPtr writer = NULL;
  					char	   *result;
  
! 					pg_xml_init(XML_PURPOSE_OTHER);
  
  					PG_TRY();
  					{
  						buf = xmlBufferCreate();
! 						XML_REPORT_ERROR(buf != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 										 "could not allocate xmlBuffer");
  						writer = xmlNewTextWriterMemory(buf, 0);
! 						XML_REPORT_ERROR(writer != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 										 "could not allocate xmlTextWriter");
  
  						if (xmlbinary == XMLBINARY_BASE64)
  							xmlTextWriterWriteBase64(writer, VARDATA_ANY(bstr),
*************** map_sql_value_to_xml_value(Datum value, 
*** 1788,1799 ****
--- 1950,1966 ----
  							xmlFreeTextWriter(writer);
  						if (buf)
  							xmlBufferFree(buf);
+ 							
+ 						pg_xml_done();
+ 							
  						PG_RE_THROW();
  					}
  					PG_END_TRY();
  
  					xmlBufferFree(buf);
  
+ 					pg_xml_done();
+ 
  					return result;
  				}
  #endif   /* USE_LIBXML */
*************** xpath_internal(text *xpath_expr_text, xm
*** 3381,3387 ****
  	memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
  	xpath_expr[xpath_len] = '\0';
  
! 	pg_xml_init();
  	xmlInitParser();
  
  	PG_TRY();
--- 3548,3554 ----
  	memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
  	xpath_expr[xpath_len] = '\0';
  
! 	pg_xml_init(XML_PURPOSE_OTHER);
  	xmlInitParser();
  
  	PG_TRY();
*************** xpath_internal(text *xpath_expr_text, xm
*** 3391,3411 ****
  		 * command execution are possible)
  		 */
  		ctxt = xmlNewParserCtxt();
! 		if (ctxt == NULL)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate parser context");
  		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 		if (doc == NULL)
! 			xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 						"could not parse XML document");
  		xpathctx = xmlXPathNewContext(doc);
! 		if (xpathctx == NULL)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate XPath context");
  		xpathctx->node = xmlDocGetRootElement(doc);
! 		if (xpathctx->node == NULL)
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"could not find root XML element");
  
  		/* register namespaces, if any */
  		if (ns_count > 0)
--- 3558,3574 ----
  		 * command execution are possible)
  		 */
  		ctxt = xmlNewParserCtxt();
! 		XML_REPORT_ERROR(ctxt != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 						 "could not allocate parser context");
  		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 		XML_REPORT_ERROR(doc != NULL, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 						 "could not parse XML document");
  		xpathctx = xmlXPathNewContext(doc);
! 		XML_REPORT_ERROR(xpathctx != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 						 "could not allocate XPath context");
  		xpathctx->node = xmlDocGetRootElement(doc);
! 		XML_REPORT_ERROR(xpathctx->node != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 						 "could not find root XML element");
  
  		/* register namespaces, if any */
  		if (ns_count > 0)
*************** xpath_internal(text *xpath_expr_text, xm
*** 3432,3440 ****
  		}
  
  		xpathcomp = xmlXPathCompile(xpath_expr);
! 		if (xpathcomp == NULL)	/* TODO: show proper XPath error details */
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"invalid XPath expression");
  
  		/*
  		 * Version 2.6.27 introduces a function named
--- 3595,3602 ----
  		}
  
  		xpathcomp = xmlXPathCompile(xpath_expr);
! 		XML_REPORT_ERROR(xpathcomp != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 						 "invalid XPath expression");
  
  		/*
  		 * Version 2.6.27 introduces a function named
*************** xpath_internal(text *xpath_expr_text, xm
*** 3444,3452 ****
  		 * the same.
  		 */
  		xpathobj = xmlXPathCompiledEval(xpathcomp, xpathctx);
! 		if (xpathobj == NULL)	/* TODO: reason? */
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"could not create XPath object");
  
  		/* return empty array in cases when nothing is found */
  		if (xpathobj->nodesetval == NULL)
--- 3606,3613 ----
  		 * the same.
  		 */
  		xpathobj = xmlXPathCompiledEval(xpathcomp, xpathctx);
! 		XML_REPORT_ERROR(xpathobj != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 						 "could not create XPath object");
  
  		/* return empty array in cases when nothing is found */
  		if (xpathobj->nodesetval == NULL)
*************** xpath_internal(text *xpath_expr_text, xm
*** 3481,3486 ****
--- 3642,3650 ----
  			xmlFreeDoc(doc);
  		if (ctxt)
  			xmlFreeParserCtxt(ctxt);
+ 			
+ 		pg_xml_done();
+ 			
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
*************** xpath_internal(text *xpath_expr_text, xm
*** 3490,3495 ****
--- 3654,3661 ----
  	xmlXPathFreeContext(xpathctx);
  	xmlFreeDoc(doc);
  	xmlFreeParserCtxt(ctxt);
+ 	
+ 	pg_xml_done();
  }
  #endif   /* USE_LIBXML */
  
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 6359cd6..6735521 100644
*** a/src/include/utils/xml.h
--- b/src/include/utils/xml.h
*************** typedef enum
*** 68,74 ****
  	XML_STANDALONE_OMITTED
  }	XmlStandaloneType;
  
! extern void pg_xml_init(void);
  extern void xml_ereport(int level, int sqlcode, const char *msg);
  extern xmltype *xmlconcat(List *args);
  extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
--- 68,93 ----
  	XML_STANDALONE_OMITTED
  }	XmlStandaloneType;
  
! typedef enum
! {
! 	XML_PURPOSE_NONE /* Don't setup error handler. pg_xml_done() not required */,
! 	XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */,
! 	XML_PURPOSE_WELLFORMED /* Ignore non-parser errors, nothing else*/,
! 	XML_PURPOSE_OTHER /* Report all errors */
! } XmlPurposeType;
! 
! extern bool xml_error_occured;
! 
! #define XML_REPORT_ERROR(assertion,level,sqlcode,msg) \
! 	if (pg_xml_erroroccurred() || !(assertion)) { \
! 		xml_ereport(level, sqlcode, msg); \
! 	} \
! 	else { \
! 	}
! 
! extern void pg_xml_init(XmlPurposeType purpose);
! extern void pg_xml_done(void);
! extern bool pg_xml_erroroccurred(void);
  extern void xml_ereport(int level, int sqlcode, const char *msg);
  extern xmltype *xmlconcat(List *args);
  extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index eaa5a74..da2f290 100644
*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
  <wrong
        ^
  SELECT * FROM xmltest;
--- 8,14 ----
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
  <wrong
        ^
  SELECT * FROM xmltest;
*************** SELECT xmlconcat('bad', '<syntax');
*** 62,68 ****
  ERROR:  invalid XML content
  LINE 1: SELECT xmlconcat('bad', '<syntax');
                                  ^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag syntax line 1
  <syntax
         ^
  SELECT xmlconcat('<foo/>', NULL, '<?xml version="1.1" standalone="no"?><bar/>');
--- 62,68 ----
  ERROR:  invalid XML content
  LINE 1: SELECT xmlconcat('bad', '<syntax');
                                  ^
! DETAIL:  line 1: Couldn't find end of Start Tag syntax line 1
  <syntax
         ^
  SELECT xmlconcat('<foo/>', NULL, '<?xml version="1.1" standalone="no"?><bar/>');
*************** SELECT xmlparse(content '<abc>x</abc>');
*** 206,214 ****
   <abc>x</abc>
  (1 row)
  
  SELECT xmlparse(document 'abc');
  ERROR:  invalid XML document
! DETAIL:  Entity: line 1: parser error : Start tag expected, '<' not found
  abc
  ^
  SELECT xmlparse(document '<abc>x</abc>');
--- 206,259 ----
   <abc>x</abc>
  (1 row)
  
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: xmlParseEntityRef: no name
+ <invalidentity>&</invalidentity>
+                 ^
+ line 1: chunk is not well balanced
+ <invalidentity>&</invalidentity>
+                                 ^
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <undefinedentity>&idontexist;</undefinedentity>
+                              ^
+ line 1: chunk is not well balanced
+ <undefinedentity>&idontexist;</undefinedentity>
+                                                ^
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+          xmlparse          
+ ---------------------------
+  <invalidns xmlns='&lt;'/>
+ (1 row)
+ 
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+             xmlparse            
+ --------------------------------
+  <relativens xmlns='relative'/>
+ (1 row)
+ 
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <twoerrors>&idontexist;</unbalanced>
+                        ^
+ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ line 1: chunk is not well balanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
+       xmlparse       
+ ---------------------
+  <nosuchprefix:tag/>
+ (1 row)
+ 
  SELECT xmlparse(document 'abc');
  ERROR:  invalid XML document
! DETAIL:  line 1: Start tag expected, '<' not found
  abc
  ^
  SELECT xmlparse(document '<abc>x</abc>');
*************** SELECT xmlparse(document '<abc>x</abc>')
*** 217,222 ****
--- 262,309 ----
   <abc>x</abc>
  (1 row)
  
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: xmlParseEntityRef: no name
+ <invalidentity>&</abc>
+                 ^
+ line 1: Opening and ending tag mismatch: invalidentity line 1 and abc
+ <invalidentity>&</abc>
+                       ^
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <undefinedentity>&idontexist;</abc>
+                              ^
+ line 1: Opening and ending tag mismatch: undefinedentity line 1 and abc
+ <undefinedentity>&idontexist;</abc>
+                                    ^
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+          xmlparse          
+ ---------------------------
+  <invalidns xmlns='&lt;'/>
+ (1 row)
+ 
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+             xmlparse            
+ --------------------------------
+  <relativens xmlns='relative'/>
+ (1 row)
+ 
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <twoerrors>&idontexist;</unbalanced>
+                        ^
+ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
+       xmlparse       
+ ---------------------
+  <nosuchprefix:tag/>
+ (1 row)
+ 
  SELECT xmlpi(name foo);
    xmlpi  
  ---------
*************** SELECT '<>' IS NOT DOCUMENT;
*** 379,385 ****
  ERROR:  invalid XML content
  LINE 1: SELECT '<>' IS NOT DOCUMENT;
                 ^
! DETAIL:  Entity: line 1: parser error : StartTag: invalid element name
  <>
   ^
  SELECT xmlagg(data) FROM xmltest;
--- 466,472 ----
  ERROR:  invalid XML content
  LINE 1: SELECT '<>' IS NOT DOCUMENT;
                 ^
! DETAIL:  line 1: StartTag: invalid element name
  <>
   ^
  SELECT xmlagg(data) FROM xmltest;
*************** EXECUTE foo ('bad');
*** 425,431 ****
  ERROR:  invalid XML document
  LINE 1: EXECUTE foo ('bad');
                       ^
! DETAIL:  Entity: line 1: parser error : Start tag expected, '<' not found
  bad
  ^
  SET XML OPTION CONTENT;
--- 512,518 ----
  ERROR:  invalid XML document
  LINE 1: EXECUTE foo ('bad');
                       ^
! DETAIL:  line 1: Start tag expected, '<' not found
  bad
  ^
  SET XML OPTION CONTENT;
*************** SELECT xml_is_well_formed('<pg:foo xmlns
*** 679,684 ****
--- 766,801 ----
   t
  (1 row)
  
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+  xml_is_well_formed 
+ --------------------
+  t
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+  xml_is_well_formed 
+ --------------------
+  t
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
   xml_is_well_formed 
*************** SELECT xml_is_well_formed('abc');
*** 686,688 ****
--- 803,838 ----
   t
  (1 row)
  
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ ERROR:  could not parse XML document
+ DETAIL:  xmlns: '<' is not a valid URI
+ <invalidns xmlns='&lt;'/>
+                        ^
+ CONTEXT:  SQL function "xpath" statement 1
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ ERROR:  could not parse XML document
+ DETAIL:  Namespace prefix nosuchprefix on tag is not defined
+ <nosuchprefix:tag/>
+                  ^
+ CONTEXT:  SQL function "xpath" statement 1
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+ WARNING:  xmlns: URI relative is not absolute
+ <relativens xmlns='relative'/>
+                             ^
+ CONTEXT:  SQL function "xpath" statement 1
+                 xpath                 
+ --------------------------------------
+  {"<relativens xmlns=\"relative\"/>"}
+ (1 row)
+ 
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 711b435..7cbc174 100644
*** a/src/test/regress/expected/xml_1.out
--- b/src/test/regress/expected/xml_1.out
*************** SELECT xmlparse(content '<abc>x</abc>');
*** 172,177 ****
--- 172,201 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SELECT xmlparse(document 'abc');
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
*************** SELECT xmlparse(document '<abc>x</abc>')
*** 180,185 ****
--- 204,233 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SELECT xmlpi(name foo);
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
*************** SELECT xml_is_well_formed('<pg:foo xmlns
*** 627,634 ****
--- 675,731 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ LINE 1: SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+                            ^
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ LINE 1: SELECT xpath('/*', '<nosuchprefix:tag/>');
+                            ^
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ LINE 1: SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+                            ^
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 717a1e7..03830b9 100644
*** a/src/test/regress/sql/xml.sql
--- b/src/test/regress/sql/xml.sql
*************** SELECT xmlelement(name foo, xmlattribute
*** 62,70 ****
--- 62,82 ----
  
  SELECT xmlparse(content 'abc');
  SELECT xmlparse(content '<abc>x</abc>');
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
  
  SELECT xmlparse(document 'abc');
  SELECT xmlparse(document '<abc>x</abc>');
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
  
  
  SELECT xmlpi(name foo);
*************** SELECT xml_is_well_formed('<foo><bar>baz
*** 208,213 ****
--- 220,247 ----
  SELECT xml_is_well_formed('<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
  SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</my:foo>');
  SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</pg:foo>');
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
  
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
+ 
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ 
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ 
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
#6Noah Misch
noah@leadboat.com
In reply to: Florian Pflug (#5)
1 attachment(s)
Re: Another issue with invalid XML values

Hi Florian,

I tested this patch using two systems, a CentOS 5 box with
libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2
2.6.31.dfsg-2ubuntu1.6. Both failed to build with this error:

xml.c: In function `pg_xml_init':
xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this function)
xml.c:934: error: (Each undeclared identifier is reported only once
xml.c:934: error: for each function it appears in.)
make[4]: *** [xml.o] Error 1

It seems `xmlStructuredErrorContext' was added rather recently. It's not
obvious which version added it, but 2.7.8 has it and 2.7.2 does not. Looking
at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for
both structured and generic handler functions. Based on that, I tried with a
change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our
xml.c. I ran the test suite and hit some failures in the xml test; see
attached regression.diffs. I received warnings where the tests expected
nothing or expected errors. Looks like my version of libxml2 (2.6.26 in this
case) classifies some of these messages differently.

I suggest testing the next version with both libxml2 2.6.23, the minimum
supported version, and libxml2 2.7.8, the latest version.

On Thu, Jun 09, 2011 at 06:11:46PM +0200, Florian Pflug wrote:

On Jun2, 2011, at 01:34 , Florian Pflug wrote:

On Jun2, 2011, at 00:02 , Noah Misch wrote:

On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

Sounds sensible. Will this impose any new libxml2 version dependency?

xmlSetStructuredErrorFunc() seems to be available starting with libxml 2.6.0,
release on Oct 20, 2003. Since we already require the version to be >= 2.6.23,
we should be OK.

I won't have access to my PC the next few days, but I'll try to come up with
a patch some time next week.

Phew... I did manage to produce a patch, but it was way more work than I
had intended to put into this.

As it turns out, you loose the nicely formatted context information that
libxml2 provides via the generic error func once you switch to structured
error reporting. Registering handlers for both doesn't help either, since
the generic error handler isn't called once you register a structured one.

Fortunately, libxml does export xmlParserPrintFileContext() which
generates these context messages. It, however, doesn't return a string,
but instead passes them to the generic error handler (this time, independent
from whether a structural error handler is registered or not).

Interesting API, there.

As it stood, the code assumed that all third-party library re-install
their libxml error handlers before each library call, and thus didn't
bother to restore the old error handler itself. Since I revamped the
error handling anyway, I removed that requirement. There is now a
function pg_xml_done() which restores the original error handler that
we overwrote in pg_xml_init().

Seems reasonable.

I also realized that some libxml error (like undefined namespace prefixes)
must be ignored during xmlparse() and friends. Otherwise, it becomes
impossible to build XML documents from individual fragments. pg_xml_init()
therefore now takes an argument which specifies how strict the error
checking is supposed to be. For the moment, only XPATH() uses the strict
mode in which we report all errors. XMLPARSE() and friends only report
parse errors, not namespace errors.

Seems fair offhand. We must preserve the invariant that any object with type
"xml" can be passed through xml_out() and then be accepted by xml_in(). I'm
not seeing any specific risks there, but I figure I'd mention it.

I couldn't reconcile this description with the code. I do see that
xpath_internal() uses XML_PURPOSE_OTHER, but so does xmlelement(). Why is
xmlelement() also ready for the strictest error reporting?

Also note that we may be able to be more strict when xmloption = document.

Finally, I had to adjust contrib/xml2 because it uses some parts of
the core XML support like pg_xml_init().

Heres the indended behaviour with the patch applied:
----------------------------------------------------

We always use structured error handling. For now, the error messages
pretty much resemble the old ones, but it's now easy to add additional
information.

XMLPARSE() and casting to XML check for parse errors only, like they do
without the patch. They're also capable of reporting warnings, but I
didn't find a case where the libxml parser generates a warning.

XPATH() reports all errors and warnings. Trying to use XPATH() on
a document with e.g. inconsistent namespace usage or invalid
namespace URIs therefore now raises an error. This is *necessary*
because libxml's XPath evaluator gets confused if it encounters
e.g. invalid namespace URI and outputs invalid XML in response.

Ideally, an invalid namespace URI should always be an error. While namespace
prefixes could become valid as you assemble a larger document, a namespace
URI's validity is context-free.

contrib/xml2's behaviour hasn't changed.

Patch is attached, and comments are welcome.

As far as the code itself, a few comments:

The patch adds some trailing whitespace.

*** a/contrib/xml2/xpath.c
--- b/contrib/xml2/xpath.c
*************** xpath_table(PG_FUNCTION_ARGS)
*** 720,725 ****
--- 726,732 ----
if (comppath == NULL)
{
xmlFreeDoc(doctree);
+ 						pg_xml_done();
xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
"XPath Syntax Error");
}

Remembering to put a pg_xml_done() in every relevant elog(ERROR, ...) path
seems fragile. How about using RegisterXactCallback/RegisterSubXactCallback
in pgxml_parser_init() to handle those cases? You can also use it to Assert
at non-abort transaction shutdown that pg_xml_done() was called as needed.

*************** xmlelement(XmlExprState *xmlExpr, ExprCo
*** 597,614 ****
}

/* now safe to run libxml */
! pg_xml_init();

PG_TRY();
{
buf = xmlBufferCreate();
! if (!buf)
! xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! "could not allocate xmlBuffer");
writer = xmlNewTextWriterMemory(buf, 0);
! if (!writer)
! xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! "could not allocate xmlTextWriter");

xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);

--- 607,622 ----
}

/* now safe to run libxml */
! pg_xml_init(XML_PURPOSE_OTHER);

PG_TRY();
{
buf = xmlBufferCreate();
! XML_REPORT_ERROR(buf != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! "could not allocate xmlBuffer");
writer = xmlNewTextWriterMemory(buf, 0);
! XML_REPORT_ERROR(writer != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! "could not allocate xmlTextWriter");

xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);

Consider renaming XML_REPORT_ERROR it to something like XML_CHECK_ERROR,
emphasizing that it wraps a conditional.

*************** xml_parse(text *data, XmlOptionType xmlo
*** 1175,1182 ****
int32 len;
xmlChar *string;
xmlChar *utf8string;
! xmlParserCtxtPtr ctxt;
! xmlDocPtr doc;

len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
string = xml_text2xmlChar(data);
--- 1244,1251 ----
int32		len;
xmlChar    *string;
xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt = NULL;
! 	xmlDocPtr	doc = NULL;

len = VARSIZE(data) - VARHDRSZ; /* will be useful later */
string = xml_text2xmlChar(data);

Is this change needed?

*************** void
*** 1337,1366 ****
xml_ereport(int level, int sqlcode, const char *msg)
{
char *detail;
!
/*
! * It might seem that we should just pass xml_err_buf->data directly to
! * errdetail. However, we want to clean out xml_err_buf before throwing
* error, in case there is another function using libxml further down the
* call stack.
*/
! if (xml_err_buf->len > 0)
{
! detail = pstrdup(xml_err_buf->data);
! resetStringInfo(xml_err_buf);
}
else
detail = NULL;

if (detail)
{
- 		size_t		len;
- 
- 		/* libxml error messages end in '\n'; get rid of it */
- 		len = strlen(detail);
- 		if (len > 0 && detail[len - 1] == '\n')
- 			detail[len - 1] = '\0';
- 
ereport(level,
(errcode(sqlcode),
errmsg("%s", msg),
--- 1408,1430 ----
xml_ereport(int level, int sqlcode, const char *msg)
{
char	   *detail;
! 	
/*
! 	 * It might seem that we should just pass xml_error_detail_buf->data directly to
! 	 * errdetail.  However, we want to clean out xml_error_detail_buf before throwing
* error, in case there is another function using libxml further down the
* call stack.
*/
! 	if (xml_error_detail_buf->len > 0)
{
! 		detail = pstrdup(xml_error_detail_buf->data);
! 		resetStringInfo(xml_error_detail_buf);

The xml_err_buf -> xml_error_detail_buf change should be its own patch, if you
think it's important. Changing the name at the same time makes it harder to
verify this patch.

}
else
detail = NULL;

if (detail)
{
ereport(level,
(errcode(sqlcode),
errmsg("%s", msg),
*************** xml_ereport(int level, int sqlcode, cons
*** 1376,1402 ****

/*
! * Error handler for libxml error messages
*/
static void
! xml_errorHandler(void *ctxt, const char *msg,...)
{
! /* Append the formatted text to xml_err_buf */
! for (;;)
{
! va_list args;
! bool success;

- /* Try to format the data. */
- va_start(args, msg);
- success = appendStringInfoVA(xml_err_buf, msg, args);
- va_end(args);

! if (success)
break;
!
! /* Double the buffer size and try again. */
! enlargeStringInfo(xml_err_buf, xml_err_buf->maxlen);
}
}

--- 1440,1566 ----

/*
! * Append a newline after removing all existing trailing newlines
*/
static void
! appendStringInfoLineSeparator(StringInfo str)
{
! chopStringInfoNewlines(str);
!
! if (str->len > 0)
! appendStringInfoChar(str, '\n');
! }
!
!
! /*
! * Remove all trailing newlines
! */
! static void
! chopStringInfoNewlines(StringInfo str)
! {
! while ((str->len > 0) &&
! (str->data[str->len-1] == '\n'))
{
! str->data[--str->len] = '\0';
! }
! }

Is there something about this patch's other changes that make it necessary to
strip multiple newlines, vs. one newline as we did before, and to do it in a
different place in the code?

! /*
! * Error handler for libxml errors and warnings
! */
! static void
! xml_errorHandler(void* data, xmlErrorPtr error)
! {
! /* Buffer to hold the error detail */
! StringInfo errorBuf = makeStringInfo();
!
! /* Get parser and input context */
! xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr) error->ctxt;
! xmlParserInputPtr input = (ctxt != NULL) ? ctxt->input : NULL;
! xmlNodePtr node = error->node;
! const xmlChar* name = ((node != NULL) && (node->type == XML_ELEMENT_NODE)) ?
! node->name : NULL;
!
! switch (error->domain) {
! case XML_FROM_NONE:
! case XML_FROM_PARSER:
! case XML_FROM_MEMORY:
! case XML_FROM_IO:
! /* Accept regardless of the parsing purpose */
! break;
!
! default:
! /* Ignore during well-formedness check */
! if (xml_purpose == XML_PURPOSE_WELLFORMED)
! return;
break;
! }
!
! /* Append error message to xml_error_detail_buf */
! if ((error->domain == XML_FROM_PARSER) && (error->line > 0))
! appendStringInfo(errorBuf, "line %d: ", error->line);
! if (name != NULL)
! appendStringInfo(errorBuf, "element %s: ", name);
! appendStringInfo(errorBuf, "%s", error->message);
!
! /* Append context information to xml_error_detail_buf.
! * xmlParserPrintFileContext() uses the *generic* error handle to
! * write the context. Since we don't want to duplicate libxml
! * functionality here, we setup a generic error handler temporarily
! */
! if (input != NULL)
! {
! /* Save generic error func and setup the xml_error_detail_buf
! * appender as generic error func. This should work because
! * appendStringInfo() has essentially the same signature as
! * xmlGenericErrorFunc().
! */
! xmlGenericErrorFunc errFuncSaved = xmlGenericError;
! void* errCtxSaved = xmlGenericErrorContext;
! xmlSetGenericErrorFunc(errorBuf, (xmlGenericErrorFunc)appendStringInfo);
!
! /* Generic context information */
! appendStringInfoLineSeparator(errorBuf);
! xmlParserPrintFileContext(input);
!
! /* Restore generic error func */
! xmlSetGenericErrorFunc(errCtxSaved, errFuncSaved);
! }
!
! chopStringInfoNewlines(errorBuf);
!
! /* Legacy error handling. The error flag is never set. Exists because
! * the xml2 contrib module uses our error-handling infrastructure, but
! * we don't want to change its behaviour since it's deprecated anyway
! */
! if (xml_purpose == XML_PURPOSE_LEGACY)
! {
! appendStringInfoLineSeparator(xml_error_detail_buf);
! appendStringInfoString(xml_error_detail_buf, errorBuf->data);
! return;
! }
!
! /* We don't want to ereport() here because that'd probably leave
! * libxml in an inconsistent state. Instead, we remember the
! * error and ereport() from xml_ereport().
! *
! * Warnings and notices are reported immediatly since they don't cause a

typo

! * longjmp() out of libxml.
! */
! if (error->level >= XML_ERR_ERROR)
! {
! appendStringInfoLineSeparator(xml_error_detail_buf);
! appendStringInfoString(xml_error_detail_buf, errorBuf->data);
! xml_error_occured = true;
! }
! else if (error->level >= XML_ERR_WARNING)
! {
! ereport(WARNING, (errmsg("%s", errorBuf->data)));
! }
! else
! {
! ereport(NOTICE, (errmsg("%s", errorBuf->data)));
}
}

diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 6359cd6..6735521 100644
*** a/src/include/utils/xml.h
--- b/src/include/utils/xml.h
*************** typedef enum
*** 68,74 ****
XML_STANDALONE_OMITTED
}	XmlStandaloneType;
! extern void pg_xml_init(void);
extern void xml_ereport(int level, int sqlcode, const char *msg);
extern xmltype *xmlconcat(List *args);
extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
--- 68,93 ----
XML_STANDALONE_OMITTED
}	XmlStandaloneType;

! typedef enum
! {
! XML_PURPOSE_NONE /* Don't setup error handler. pg_xml_done() not required */,

It's odd to have a purpose of "NONE" that pg_xml_init() callers can actually
use. How about XML_PURPOSE_TRIVIAL, for callers that only call libxml2
functions that can't error? Also, I think it would be better to require a
call to pg_xml_done() in all cases, just to keep things simple. Actually, I
wonder if it's worth having XML_PURPOSE_TRIVIAL for the one user thereof. It
doesn't save much.

! XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */,

It's fine to set the flag for legacy users, considering they could just not
read it. The important distinction seems to be that we don't emit warnings or
notices in this case. Is that correct? If so, the comment should reflect
that emphasis. Then, consider updating the flag unconditionally.

! XML_PURPOSE_WELLFORMED /* Ignore non-parser errors, nothing else*/,
! XML_PURPOSE_OTHER /* Report all errors */
! } XmlPurposeType;

This is currently about error reporting only. How about:

typedef enum
{
XML_ER_NONE, /* no calls to libxml2 functions permitted */
XML_ER_WELLFORMED, /* report errors and warnings from the parser only */
XML_ER_LEGACY, /* report all errors, no warnings */
XML_ER_ALL /* report all errors and warnings */
} XmlErrorReporting;

!
! extern bool xml_error_occured;
!
! #define XML_REPORT_ERROR(assertion,level,sqlcode,msg) \
! if (pg_xml_erroroccurred() || !(assertion)) { \
! xml_ereport(level, sqlcode, msg); \
! } \
! else { \
! }

Consider moving this definition into xml.c. It's more of an internal
shorthand for keeping that file simple than part of the API.

! 
! extern void pg_xml_init(XmlPurposeType purpose);
! extern void pg_xml_done(void);
! extern bool pg_xml_erroroccurred(void);
extern void xml_ereport(int level, int sqlcode, const char *msg);
extern xmltype *xmlconcat(List *args);
extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index eaa5a74..da2f290 100644
*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
<wrong
^
SELECT * FROM xmltest;
--- 8,14 ----
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1

Any reason to change the error text this way?

Thanks,
nm

Attachments:

regression.diffstext/plain; charset=us-asciiDownload
*** /home/nmisch/src/cvs/postgresql/src/test/regress/expected/xml.out	2011-06-20 00:51:56.000000000 -0400
--- /home/nmisch/src/cvs/postgresql/src/test/regress/results/xml.out	2011-06-20 01:19:20.000000000 -0400
***************
*** 223,234 ****
--- 223,240 ----
  <undefinedentity>&idontexist;</undefinedentity>
                                                 ^
  SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+ WARNING:  line 1: xmlns: < not a valid URI
+ <invalidns xmlns='&lt;'/>
+                        ^
           xmlparse          
  ---------------------------
   <invalidns xmlns='&lt;'/>
  (1 row)
  
  SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+ WARNING:  line 1: xmlns: URI relative is not absolute
+ <relativens xmlns='relative'/>
+                             ^
              xmlparse            
  --------------------------------
   <relativens xmlns='relative'/>
***************
*** 279,290 ****
--- 285,302 ----
  <undefinedentity>&idontexist;</abc>
                                     ^
  SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+ WARNING:  line 1: xmlns: < not a valid URI
+ <invalidns xmlns='&lt;'/>
+                        ^
           xmlparse          
  ---------------------------
   <invalidns xmlns='&lt;'/>
  (1 row)
  
  SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+ WARNING:  line 1: xmlns: URI relative is not absolute
+ <relativens xmlns='relative'/>
+                             ^
              xmlparse            
  --------------------------------
   <relativens xmlns='relative'/>
***************
*** 779,790 ****
--- 791,808 ----
  (1 row)
  
  SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+ WARNING:  line 1: xmlns: < not a valid URI
+ <invalidns xmlns='&lt;'/>
+                        ^
   xml_is_well_formed 
  --------------------
   t
  (1 row)
  
  SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+ WARNING:  line 1: xmlns: URI relative is not absolute
+ <relativens xmlns='relative'/>
+                             ^
   xml_is_well_formed 
  --------------------
   t
***************
*** 812,822 ****
  -- which is invalid beecause '<' may not appear un-escaped in
  -- attribute values.
  SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
! ERROR:  could not parse XML document
! DETAIL:  xmlns: '<' is not a valid URI
  <invalidns xmlns='&lt;'/>
                         ^
  CONTEXT:  SQL function "xpath" statement 1
  -- Again, the XML isn't well-formed for namespace purposes
  SELECT xpath('/*', '<nosuchprefix:tag/>');
  ERROR:  could not parse XML document
--- 830,849 ----
  -- which is invalid beecause '<' may not appear un-escaped in
  -- attribute values.
  SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
! WARNING:  line 1: xmlns: < not a valid URI
! <invalidns xmlns='&lt;'/>
!                        ^
! LINE 1: SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
!                            ^
! WARNING:  line 1: xmlns: < not a valid URI
  <invalidns xmlns='&lt;'/>
                         ^
  CONTEXT:  SQL function "xpath" statement 1
+             xpath             
+ ------------------------------
+  {"<invalidns xmlns=\"<\"/>"}
+ (1 row)
+ 
  -- Again, the XML isn't well-formed for namespace purposes
  SELECT xpath('/*', '<nosuchprefix:tag/>');
  ERROR:  could not parse XML document
***************
*** 827,833 ****
  -- XPath deprecates relative namespaces, but they're not supposed to
  -- thrown an error, only a warning.
  SELECT xpath('/*', '<relativens xmlns=''relative''/>');
! WARNING:  xmlns: URI relative is not absolute
  <relativens xmlns='relative'/>
                              ^
  CONTEXT:  SQL function "xpath" statement 1
--- 854,865 ----
  -- XPath deprecates relative namespaces, but they're not supposed to
  -- thrown an error, only a warning.
  SELECT xpath('/*', '<relativens xmlns=''relative''/>');
! WARNING:  line 1: xmlns: URI relative is not absolute
! <relativens xmlns='relative'/>
!                             ^
! LINE 1: SELECT xpath('/*', '<relativens xmlns=''relative''/>');
!                            ^
! WARNING:  line 1: xmlns: URI relative is not absolute
  <relativens xmlns='relative'/>
                              ^
  CONTEXT:  SQL function "xpath" statement 1

======================================================================

#7Florian Pflug
fgp@phlo.org
In reply to: Noah Misch (#6)
Re: Another issue with invalid XML values

Hi

Thanks for the extensive review, it's very much appreciated!

On Jun20, 2011, at 19:57 , Noah Misch wrote:

I tested this patch using two systems, a CentOS 5 box with
libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2
2.6.31.dfsg-2ubuntu1.6. Both failed to build with this error:

xml.c: In function `pg_xml_init':
xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this function)
xml.c:934: error: (Each undeclared identifier is reported only once
xml.c:934: error: for each function it appears in.)
make[4]: *** [xml.o] Error 1

It seems `xmlStructuredErrorContext' was added rather recently. It's not
obvious which version added it, but 2.7.8 has it and 2.7.2 does not. Looking
at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for
both structured and generic handler functions. Based on that, I tried with a
change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our
xml.c. I ran the test suite and hit some failures in the xml test; see
attached regression.diffs. I received warnings where the tests expected
nothing or expected errors.

Great :-(. I wonder if maybe I should simply remove the part of the patch
which try to restore the error handler and context in pg_xml_done(). I've
added that because I feared that some 3rd-party libraries setup their libxml
error handle just once, not before every library class. The code previously
didn't care about this either, but since we're using structured instead of
generic error reporting now, we'd now collide with 3-rd party libs which
also use structured error handling, whereas previously that would have worked.
OTOH, once there's more than one such library...

Whats your stand on this?

Looks like my version of libxml2 (2.6.26 in this
case) classifies some of these messages differently.

Hm, that's exactly what I hoped that this patch would *prevent* from happening..
Will look into this.

On Thu, Jun 09, 2011 at 06:11:46PM +0200, Florian Pflug wrote:

On Jun2, 2011, at 01:34 , Florian Pflug wrote:
I also realized that some libxml error (like undefined namespace prefixes)
must be ignored during xmlparse() and friends. Otherwise, it becomes
impossible to build XML documents from individual fragments. pg_xml_init()
therefore now takes an argument which specifies how strict the error
checking is supposed to be. For the moment, only XPATH() uses the strict
mode in which we report all errors. XMLPARSE() and friends only report
parse errors, not namespace errors.

Seems fair offhand. We must preserve the invariant that any object with type
"xml" can be passed through xml_out() and then be accepted by xml_in(). I'm
not seeing any specific risks there, but I figure I'd mention it.

I couldn't reconcile this description with the code. I do see that
xpath_internal() uses XML_PURPOSE_OTHER, but so does xmlelement(). Why is
xmlelement() also ready for the strictest error reporting?

xmlelement() doesn't use libxml's parser, only the xml writer. I didn't
want to suppress any errors the writer might raise (I'm not sure it raises
error at all, though).

Also note that we may be able to be more strict when xmloption = document.

Yeah, probably. I think that better done in a separate patch, though - I've
tried not to change existing behaviour with this patch except where absolutely
necessary (i.e. for XPATH)

Ideally, an invalid namespace URI should always be an error. While namespace
prefixes could become valid as you assemble a larger document, a namespace
URI's validity is context-free.

I agree. That, however, would require xmlelement() to check all xmlns*
attributes, and I didn't find a way to do that with libxml() (Well,
other than to parse the element after creating it, but that's a huge
kludge). So I left that issue for a later time...

The patch adds some trailing whitespace.

Ups, sorry for that. Will fix.

Remembering to put a pg_xml_done() in every relevant elog(ERROR, ...) path
seems fragile. How about using RegisterXactCallback/RegisterSubXactCallback
in pgxml_parser_init() to handle those cases? You can also use it to Assert
at non-abort transaction shutdown that pg_xml_done() was called as needed.

Hm, isn't that a bit fragile too? It seems that PG_TRY/PG_CATCH blocks don't
necessarily create sub-transactions, do they?

Also, most elog() paths already contained xmlFree() calls, because libxml
seemingly cannot be made to use context-based memory management. So one
already needs to be pretty careful when touching these...

Anyway, should we decide to give up on trying to restore the error handler
and context, we could get rid of pg_xml_done() again... (See above for the
details on that)

Consider renaming XML_REPORT_ERROR it to something like XML_CHECK_ERROR,
emphasizing that it wraps a conditional.

Hm, I think it was named XML_CHECK_ERROR at one point, and I changed it
because I felt it didn't convey the fact that it's actually ereport()
and not just return false on error. But I agree that XML_REPORT_ERROR isn't
very self-explanatory, either. What about XML_REPORT_PENDING_ERROR()
or XML_CHECK_AND_REPORT_ERROR()?

*************** xml_parse(text *data, XmlOptionType xmlo
*** 1175,1182 ****
int32 len;
xmlChar *string;
xmlChar *utf8string;
! xmlParserCtxtPtr ctxt;
! xmlDocPtr doc;

len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
string = xml_text2xmlChar(data);
--- 1244,1251 ----
int32		len;
xmlChar    *string;
xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt = NULL;
! 	xmlDocPtr	doc = NULL;

len = VARSIZE(data) - VARHDRSZ; /* will be useful later */
string = xml_text2xmlChar(data);

Is this change needed?

For "doc" it definitely is. Since we can no report an error even if
the parser returned a valid doc, there's now a "if (doc) xmlFree(doc)"
in the PG_CATCH patch.

For "ctxt", strictly, speaking no. But only because "ctxt = xmlNewParserCtxt()"
is the first line in the PG_TRY block, and therefore made the "xmlFree(ctxt)"
unconditional. But doing it this way seems more consistent and less error-prone.

The xml_err_buf -> xml_error_detail_buf change should be its own patch, if you
think it's important. Changing the name at the same time makes it harder to
verify this patch.

Agreed. That name change is a left-over from a version where there were two
error buffers. Will fix.

Is there something about this patch's other changes that make it necessary to
strip multiple newlines, vs. one newline as we did before, and to do it in a
different place in the code?

There previously wasn't a separate function for that. I made it strip off
multiple newlines because that allowed me to keep appendStringInfoLineSeparator()
simple while still making sure that the end result is exactly one newline
at the end of the string. Previously, the coded depended on libxml always
adding a newline at the end of messages, which I'm not even sure it true
for messages reported via the structured error handler.

It's odd to have a purpose of "NONE" that pg_xml_init() callers can actually
use. How about XML_PURPOSE_TRIVIAL, for callers that only call libxml2
functions that can't error? Also, I think it would be better to require a
call to pg_xml_done() in all cases, just to keep things simple. Actually, I
wonder if it's worth having XML_PURPOSE_TRIVIAL for the one user thereof. It
doesn't save much.

Yeah, that name really is a perfect example of horrible naming ;-)

The pg_xml_done()-not-required semantics are necessary though. The problem is
that parse_xml_decl() is sometimes called after the caller has already called
pg_xml_init() himself. Not always though, so one cannot simply remove the
pg_xml_init() call from parse_xml_decl() - it might then use libxml without
it being initialized. So I made pg_xml_init() skip the error-handler setup
when purpose == XML_PURPOSE_NONE. This requires then, however, that pg_xml_done()
is *not* called, because it'd restore settings that were never saved (well,
actually it'd tripe an assert that protects against that...).

But yeah, this is all quite contorted. Maybe pg_xml_init() should simply be
split into two functions, one which initialized the library and one which
sets up error handling. That means that two calls instead of one are required,
but it makes the purpose of the individual functions much clearer...

! XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */,

It's fine to set the flag for legacy users, considering they could just not
read it. The important distinction seems to be that we don't emit warnings or
notices in this case. Is that correct? If so, the comment should reflect
that emphasis. Then, consider updating the flag unconditionally.

I took me a while to remember while I did it that way, but I think I have now.

I initialled wanted to add an Assert(!xml_error_occurred) to catch any
missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
though...

So I guess I should either refrain from setting the flag as you suggested,
or add such an Assert(). I think I very slightly prefer the latter, what
do you think?

! XML_PURPOSE_WELLFORMED /* Ignore non-parser errors, nothing else*/,
! XML_PURPOSE_OTHER /* Report all errors */
! } XmlPurposeType;

This is currently about error reporting only. How about:

typedef enum
{
XML_ER_NONE, /* no calls to libxml2 functions permitted */
XML_ER_WELLFORMED, /* report errors and warnings from the parser only */
XML_ER_LEGACY, /* report all errors, no warnings */
XML_ER_ALL /* report all errors and warnings */
} XmlErrorReporting;

Sounds good. Will do.

! extern bool xml_error_occured;
!
! #define XML_REPORT_ERROR(assertion,level,sqlcode,msg) \
! if (pg_xml_erroroccurred() || !(assertion)) { \
! xml_ereport(level, sqlcode, msg); \
! } \
! else { \
! }

Consider moving this definition into xml.c. It's more of an internal
shorthand for keeping that file simple than part of the API.

The idea was that the xml stuff in contrib/ might one day be converted
to use that macro also. But yeah, should that really happen that macro
can easily be moved back to xml.h, so will fix.

! extern void pg_xml_init(XmlPurposeType purpose);
! extern void pg_xml_done(void);
! extern bool pg_xml_erroroccurred(void);
extern void xml_ereport(int level, int sqlcode, const char *msg);
extern xmltype *xmlconcat(List *args);
extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index eaa5a74..da2f290 100644
*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
<wrong
^
SELECT * FROM xmltest;
--- 8,14 ----
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1

Any reason to change the error text this way?

The "Entity:" prefix is added by libxml only for messages reported
to the generic error handler. It *doesn't* refer to entities as defined
in xml, but instead used in place of the file name if libxml
doesn't have that at hand (because it's parsing from memory).

So that "Entity:" prefix really doesn't have any meaning whatsoever.

I believe that the compatibility risk is pretty low here, and removing
the meaningless prefix makes the error message are bit nicer to read.
But if you are concerned that there maybe applications out there who
parse the error text, we could of course add it back. I must say that
I don't really know what our policy regarding error message stability is...

I'll try to produce an updated patch within a day or two.

best regards,
Florian Pflug

#8Noah Misch
noah@leadboat.com
In reply to: Florian Pflug (#7)
Re: Another issue with invalid XML values

On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:

On Jun20, 2011, at 19:57 , Noah Misch wrote:

I tested this patch using two systems, a CentOS 5 box with
libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2
2.6.31.dfsg-2ubuntu1.6. Both failed to build with this error:

xml.c: In function `pg_xml_init':
xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this function)
xml.c:934: error: (Each undeclared identifier is reported only once
xml.c:934: error: for each function it appears in.)
make[4]: *** [xml.o] Error 1

It seems `xmlStructuredErrorContext' was added rather recently. It's not
obvious which version added it, but 2.7.8 has it and 2.7.2 does not. Looking
at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for
both structured and generic handler functions. Based on that, I tried with a
change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our
xml.c. I ran the test suite and hit some failures in the xml test; see
attached regression.diffs. I received warnings where the tests expected
nothing or expected errors.

Great :-(. I wonder if maybe I should simply remove the part of the patch
which try to restore the error handler and context in pg_xml_done(). I've
added that because I feared that some 3rd-party libraries setup their libxml
error handle just once, not before every library class. The code previously
didn't care about this either, but since we're using structured instead of
generic error reporting now, we'd now collide with 3-rd party libs which
also use structured error handling, whereas previously that would have worked.
OTOH, once there's more than one such library...

Whats your stand on this?

Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
available and xmlGenericErrorContext otherwise, I would just add an Autoconf
check to identify which one to use.

I couldn't reconcile this description with the code. I do see that
xpath_internal() uses XML_PURPOSE_OTHER, but so does xmlelement(). Why is
xmlelement() also ready for the strictest error reporting?

xmlelement() doesn't use libxml's parser, only the xml writer. I didn't
want to suppress any errors the writer might raise (I'm not sure it raises
error at all, though).

Makes sense.

Also note that we may be able to be more strict when xmloption = document.

Yeah, probably. I think that better done in a separate patch, though - I've
tried not to change existing behaviour with this patch except where absolutely
necessary (i.e. for XPATH)

Likewise.

Ideally, an invalid namespace URI should always be an error. While namespace
prefixes could become valid as you assemble a larger document, a namespace
URI's validity is context-free.

I agree. That, however, would require xmlelement() to check all xmlns*
attributes, and I didn't find a way to do that with libxml() (Well,
other than to parse the element after creating it, but that's a huge
kludge). So I left that issue for a later time...

Likewise.

Remembering to put a pg_xml_done() in every relevant elog(ERROR, ...) path
seems fragile. How about using RegisterXactCallback/RegisterSubXactCallback
in pgxml_parser_init() to handle those cases? You can also use it to Assert
at non-abort transaction shutdown that pg_xml_done() was called as needed.

Hm, isn't that a bit fragile too? It seems that PG_TRY/PG_CATCH blocks don't
necessarily create sub-transactions, do they?

PG_TRY never creates a subtransaction. However, elog(ERROR, ...) aborts
either the subtransaction, or if none, the top-level transaction. Therefore,
registering the same callback with both RegisterXactCallback and
RegisterSubXactCallback ensures that it gets called for any elog/ereport
non-local exit. Then you only explicitly call pg_xml_done() in the non-error
path.

However, I see now that in the core xml.c, every error-condition pg_xml_done()
appears in a PG_CATCH block. That's plenty reliable ...

Also, most elog() paths already contained xmlFree() calls, because libxml
seemingly cannot be made to use context-based memory management. So one
already needs to be pretty careful when touching these...

... and this is a good reason to keep doing it that way. So, scratch that idea.

Consider renaming XML_REPORT_ERROR it to something like XML_CHECK_ERROR,
emphasizing that it wraps a conditional.

Hm, I think it was named XML_CHECK_ERROR at one point, and I changed it
because I felt it didn't convey the fact that it's actually ereport()
and not just return false on error. But I agree that XML_REPORT_ERROR isn't
very self-explanatory, either. What about XML_REPORT_PENDING_ERROR()
or XML_CHECK_AND_REPORT_ERROR()?

XML_CHECK_AND_REPORT_ERROR() seems fine. Or XML_CHECK_AND_EREPORT()?

On Thu, Jun 09, 2011 at 06:11:46PM +0200, Florian Pflug wrote:

*************** xml_parse(text *data, XmlOptionType xmlo
*** 1175,1182 ****
int32 len;
xmlChar *string;
xmlChar *utf8string;
! xmlParserCtxtPtr ctxt;
! xmlDocPtr doc;

len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
string = xml_text2xmlChar(data);
--- 1244,1251 ----
int32		len;
xmlChar    *string;
xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt = NULL;
! 	xmlDocPtr	doc = NULL;

len = VARSIZE(data) - VARHDRSZ; /* will be useful later */
string = xml_text2xmlChar(data);

Is this change needed?

For "doc" it definitely is. Since we can no report an error even if
the parser returned a valid doc, there's now a "if (doc) xmlFree(doc)"
in the PG_CATCH patch.

For "ctxt", strictly, speaking no. But only because "ctxt = xmlNewParserCtxt()"
is the first line in the PG_TRY block, and therefore made the "xmlFree(ctxt)"
unconditional. But doing it this way seems more consistent and less error-prone.

I see it now; thanks.

Is there something about this patch's other changes that make it necessary to
strip multiple newlines, vs. one newline as we did before, and to do it in a
different place in the code?

There previously wasn't a separate function for that. I made it strip off
multiple newlines because that allowed me to keep appendStringInfoLineSeparator()
simple while still making sure that the end result is exactly one newline
at the end of the string. Previously, the coded depended on libxml always
adding a newline at the end of messages, which I'm not even sure it true
for messages reported via the structured error handler.

Fair enough.

It's odd to have a purpose of "NONE" that pg_xml_init() callers can actually
use. How about XML_PURPOSE_TRIVIAL, for callers that only call libxml2
functions that can't error? Also, I think it would be better to require a
call to pg_xml_done() in all cases, just to keep things simple. Actually, I
wonder if it's worth having XML_PURPOSE_TRIVIAL for the one user thereof. It
doesn't save much.

Yeah, that name really is a perfect example of horrible naming ;-)

The pg_xml_done()-not-required semantics are necessary though. The problem is
that parse_xml_decl() is sometimes called after the caller has already called
pg_xml_init() himself. Not always though, so one cannot simply remove the
pg_xml_init() call from parse_xml_decl() - it might then use libxml without
it being initialized. So I made pg_xml_init() skip the error-handler setup
when purpose == XML_PURPOSE_NONE. This requires then, however, that pg_xml_done()
is *not* called, because it'd restore settings that were never saved (well,
actually it'd tripe an assert that protects against that...).

But yeah, this is all quite contorted. Maybe pg_xml_init() should simply be
split into two functions, one which initialized the library and one which
sets up error handling. That means that two calls instead of one are required,
but it makes the purpose of the individual functions much clearer...

Oh, tricky. That's an option, and the error-handler-initialization function
could just call the more-primitive one, so most callers would not need to
know. Another way is to make xml_error_initialized a counter. If it's called
a second time, increment and do nothing more. Only unregister the handler
when it's decremented to zero. No strong preference here.

! XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */,

It's fine to set the flag for legacy users, considering they could just not
read it. The important distinction seems to be that we don't emit warnings or
notices in this case. Is that correct? If so, the comment should reflect
that emphasis. Then, consider updating the flag unconditionally.

I took me a while to remember while I did it that way, but I think I have now.

I initialled wanted to add an Assert(!xml_error_occurred) to catch any
missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
though...

So I guess I should either refrain from setting the flag as you suggested,
or add such an Assert(). I think I very slightly prefer the latter, what
do you think?

I do like the idea of that assert. How about setting the flag anyway, but
making it "Assert(xml_purpose == XML_PURPOSE_LEGACY || !xml_error_occurred)"?

*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
<wrong
^
SELECT * FROM xmltest;
--- 8,14 ----
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1

Any reason to change the error text this way?

The "Entity:" prefix is added by libxml only for messages reported
to the generic error handler. It *doesn't* refer to entities as defined
in xml, but instead used in place of the file name if libxml
doesn't have that at hand (because it's parsing from memory).

So that "Entity:" prefix really doesn't have any meaning whatsoever.

So xmlParserPrintFileContext() sends different content to the generic error
handler from what "natural" calls to the handler would send?

I believe that the compatibility risk is pretty low here, and removing
the meaningless prefix makes the error message are bit nicer to read.
But if you are concerned that there maybe applications out there who
parse the error text, we could of course add it back. I must say that
I don't really know what our policy regarding error message stability is...

If the libxml2 API makes it a major pain to preserve exact messages, I
wouldn't worry.

Thanks,
nm

#9Florian Pflug
fgp@phlo.org
In reply to: Noah Misch (#8)
1 attachment(s)
Re: Another issue with invalid XML values

Hi

Updated patch attached.

On Jun20, 2011, at 22:45 , Noah Misch wrote:

On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:

On Jun20, 2011, at 19:57 , Noah Misch wrote:

I tested this patch using two systems, a CentOS 5 box with
libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2
2.6.31.dfsg-2ubuntu1.6. Both failed to build with this error:

xml.c: In function `pg_xml_init':
xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this function)
xml.c:934: error: (Each undeclared identifier is reported only once
xml.c:934: error: for each function it appears in.)
make[4]: *** [xml.o] Error 1

It seems `xmlStructuredErrorContext' was added rather recently. It's not
obvious which version added it, but 2.7.8 has it and 2.7.2 does not. Looking
at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for
both structured and generic handler functions. Based on that, I tried with a
change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our
xml.c. I ran the test suite and hit some failures in the xml test; see
attached regression.diffs. I received warnings where the tests expected
nothing or expected errors.

Great :-(. I wonder if maybe I should simply remove the part of the patch
which try to restore the error handler and context in pg_xml_done(). I've
added that because I feared that some 3rd-party libraries setup their libxml
error handle just once, not before every library class. The code previously
didn't care about this either, but since we're using structured instead of
generic error reporting now, we'd now collide with 3-rd party libs which
also use structured error handling, whereas previously that would have worked.
OTOH, once there's more than one such library...

Whats your stand on this?

Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
available and xmlGenericErrorContext otherwise, I would just add an Autoconf
check to identify which one to use.

It seems that before xmlStructuredErrorContext was introduced, the structured
and the generic error handler shared an error context, so doing what you
suggested looks sensible.

I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
this the right way. I basically copied a similar check (for setlongjump AFAIR)
and adapted it to check for xmlStructuredErrorContext instead.

Consider renaming XML_REPORT_ERROR it to something like XML_CHECK_ERROR,
emphasizing that it wraps a conditional.

Hm, I think it was named XML_CHECK_ERROR at one point, and I changed it
because I felt it didn't convey the fact that it's actually ereport()
and not just return false on error. But I agree that XML_REPORT_ERROR isn't
very self-explanatory, either. What about XML_REPORT_PENDING_ERROR()
or XML_CHECK_AND_REPORT_ERROR()?

XML_CHECK_AND_REPORT_ERROR() seems fine. Or XML_CHECK_AND_EREPORT()?

I ended up picking XML_CHECK_AND_EREPORT().

It's odd to have a purpose of "NONE" that pg_xml_init() callers can actually
use. How about XML_PURPOSE_TRIVIAL, for callers that only call libxml2
functions that can't error? Also, I think it would be better to require a
call to pg_xml_done() in all cases, just to keep things simple. Actually, I
wonder if it's worth having XML_PURPOSE_TRIVIAL for the one user thereof. It
doesn't save much.

Yeah, that name really is a perfect example of horrible naming ;-)

The pg_xml_done()-not-required semantics are necessary though. The problem is
that parse_xml_decl() is sometimes called after the caller has already called
pg_xml_init() himself. Not always though, so one cannot simply remove the
pg_xml_init() call from parse_xml_decl() - it might then use libxml without
it being initialized. So I made pg_xml_init() skip the error-handler setup
when purpose == XML_PURPOSE_NONE. This requires then, however, that pg_xml_done()
is *not* called, because it'd restore settings that were never saved (well,
actually it'd tripe an assert that protects against that...).

But yeah, this is all quite contorted. Maybe pg_xml_init() should simply be
split into two functions, one which initialized the library and one which
sets up error handling. That means that two calls instead of one are required,
but it makes the purpose of the individual functions much clearer...

Oh, tricky. That's an option, and the error-handler-initialization function
could just call the more-primitive one, so most callers would not need to
know. Another way is to make xml_error_initialized a counter. If it's called
a second time, increment and do nothing more. Only unregister the handler
when it's decremented to zero. No strong preference here.

After some pondering, I picked your first suggestion, i.e. factored
pg_xml_init_library() out of pg_xml_init() and made the one caller who
used XML_PURPOSE_NONE use that instead().

! XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */,

It's fine to set the flag for legacy users, considering they could just not
read it. The important distinction seems to be that we don't emit warnings or
notices in this case. Is that correct? If so, the comment should reflect
that emphasis. Then, consider updating the flag unconditionally.

I took me a while to remember while I did it that way, but I think I have now.

I initialled wanted to add an Assert(!xml_error_occurred) to catch any
missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
though...

So I guess I should either refrain from setting the flag as you suggested,
or add such an Assert(). I think I very slightly prefer the latter, what
do you think?

I do like the idea of that assert. How about setting the flag anyway, but
making it "Assert(xml_purpose == XML_PURPOSE_LEGACY || !xml_error_occurred)"?

I added the Assert, but didn't make the setting of the error flag unconditional.

If I did that, XML_CHECK_AND_EREPORT() would stop working for the LEGACY
case. Now, that case currently isn't exercised, but breaking nevertheless
seemed wrong to me.

*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
<wrong
^
SELECT * FROM xmltest;
--- 8,14 ----
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1

Any reason to change the error text this way?

The "Entity:" prefix is added by libxml only for messages reported
to the generic error handler. It *doesn't* refer to entities as defined
in xml, but instead used in place of the file name if libxml
doesn't have that at hand (because it's parsing from memory).

So that "Entity:" prefix really doesn't have any meaning whatsoever.

So xmlParserPrintFileContext() sends different content to the generic error
handler from what "natural" calls to the handler would send?

xmlParserPrintFileContext() only generates the "context" part of the error
message. In the example above, these are the two lines
<wrong
^
These lines don't contain the "Entity:" prefix - neither with the patch
attached nor without.

(Btw: Yeah, with generic error handling, libxml sometimes calls
the error handler multiple times for a single error, each time passing
parts of the whole error message. These "generic error handlers" really
aren't error handlers at all, but rather message output callbacks...)

I believe that the compatibility risk is pretty low here, and removing
the meaningless prefix makes the error message are bit nicer to read.
But if you are concerned that there maybe applications out there who
parse the error text, we could of course add it back. I must say that
I don't really know what our policy regarding error message stability is...

If the libxml2 API makes it a major pain to preserve exact messages, I
wouldn't worry.

It'd only require us to prepend "Entity: " to the message string, so
there's no pain there. The question is rather whether we want to continue
having a pure noise word in front of every libxml error message for
the sake of compatibility.

I vote "Nay", on the grounds that I estimate the actual breakage from
such a change to be approximately zero. Plus the fact that libxml
error messages aren't completely stable either. I had to suppress the
DETAIL output for one of the regression tests to make them work for
both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
URI in it's error message in one of these versions but not the other...

best regards,
Florian Pflug

Attachments:

pg_xml_errorhandling.v2.patchapplication/octet-stream; name=pg_xml_errorhandling.v2.patchDownload
diff --git a/configure b/configure
index 9b8cb48..c4c535c 100755
*** a/configure
--- b/configure
*************** with_gnu_ld
*** 850,855 ****
--- 850,856 ----
  enable_largefile
  enable_float4_byval
  enable_float8_byval
+ enable_float8_byval
  '
        ac_precious_vars='build_alias
  host_alias
*************** LDFLAGS
*** 860,865 ****
--- 861,867 ----
  LIBS
  CPPFLAGS
  CPP
+ CPPFLAGS
  LDFLAGS_EX
  LDFLAGS_SL
  DOCBOOKSTYLE'
*************** done
*** 5740,5745 ****
--- 5742,5812 ----
        esac
      done
    fi
+ 
+   # Cannot use AC_CHECK_FUNC because xmlStructuredErrorContext may be a
+   # macro if threading support is enabled in libxml2
+   { $as_echo "$as_me:$LINENO: checking for xmlStructuredErrorContext" >&5
+ $as_echo_n "checking for xmlStructuredErrorContext... " >&6; }
+ if test "${pgac_cv_libxml_structerrctx+set}" = set; then
+   $as_echo_n "(cached) " >&6
+ else
+   cat >conftest.$ac_ext <<_ACEOF
+ /* confdefs.h.  */
+ _ACEOF
+ cat confdefs.h >>conftest.$ac_ext
+ cat >>conftest.$ac_ext <<_ACEOF
+ /* end confdefs.h.  */
+ #include <libxml/globals.h>
+ int
+ main ()
+ {
+ void* d = xmlStructuredErrorContext
+   ;
+   return 0;
+ }
+ _ACEOF
+ rm -f conftest.$ac_objext conftest$ac_exeext
+ if { (ac_try="$ac_link"
+ case "(($ac_try" in
+   *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+   *) ac_try_echo=$ac_try;;
+ esac
+ eval ac_try_echo="\"\$as_me:$LINENO: $ac_try_echo\""
+ $as_echo "$ac_try_echo") >&5
+   (eval "$ac_link") 2>conftest.er1
+   ac_status=$?
+   grep -v '^ *+' conftest.er1 >conftest.err
+   rm -f conftest.er1
+   cat conftest.err >&5
+   $as_echo "$as_me:$LINENO: \$? = $ac_status" >&5
+   (exit $ac_status); } && {
+ 	 test -z "$ac_c_werror_flag" ||
+ 	 test ! -s conftest.err
+        } && test -s conftest$ac_exeext && {
+ 	 test "$cross_compiling" = yes ||
+ 	 $as_test_x conftest$ac_exeext
+        }; then
+   pgac_cv_libxml_structerrctx=yes
+ else
+   $as_echo "$as_me: failed program was:" >&5
+ sed 's/^/| /' conftest.$ac_ext >&5
+ 
+ 	pgac_cv_libxml_structerrctx=no
+ fi
+ 
+ rm -rf conftest.dSYM
+ rm -f core conftest.err conftest.$ac_objext conftest_ipa8_conftest.oo \
+       conftest$ac_exeext conftest.$ac_ext
+ fi
+ { $as_echo "$as_me:$LINENO: result: $pgac_cv_libxml_structerrctx" >&5
+ $as_echo "$pgac_cv_libxml_structerrctx" >&6; }
+   if test x"$pgac_cv_libxml_structerrctx" = x"yes"; then
+ 
+ cat >>confdefs.h <<\_ACEOF
+ #define HAVE_LIBXML2_XMLSTRUCTUREDERRORCONTEXT 1
+ _ACEOF
+ 
+   fi
  fi
  
  
diff --git a/configure.in b/configure.in
index e873c7b..6f9f0c4 100644
*** a/configure.in
--- b/configure.in
*************** if test "$with_libxml" = yes ; then
*** 741,746 ****
--- 741,759 ----
        esac
      done
    fi
+ 
+   # Cannot use AC_CHECK_FUNC because xmlStructuredErrorContext may be a
+   # macro if threading support is enabled in libxml2
+   AC_CACHE_CHECK([for xmlStructuredErrorContext], pgac_cv_libxml_structerrctx,
+   [AC_TRY_LINK([#include <libxml/globals.h>],
+               [void* d = xmlStructuredErrorContext],
+               [pgac_cv_libxml_structerrctx=yes],
+               [pgac_cv_libxml_structerrctx=no])])
+   if test x"$pgac_cv_libxml_structerrctx" = x"yes"; then
+     AC_DEFINE(HAVE_LIBXML2_XMLSTRUCTUREDERRORCONTEXT,
+              1,
+              [Define to 1 if your libxml has xmlStructuredErrorContext.])
+   fi
  fi
  
  AC_SUBST(with_libxml)
diff --git a/contrib/xml2/xpath.c b/contrib/xml2/xpath.c
index 44c600e..f3417ab 100644
*** a/contrib/xml2/xpath.c
--- b/contrib/xml2/xpath.c
*************** Datum		xpath_table(PG_FUNCTION_ARGS);
*** 38,44 ****
  
  /* exported for use by xslt_proc.c */
  
! void		pgxml_parser_init(void);
  
  /* workspace for pgxml_xpath() */
  
--- 38,44 ----
  
  /* exported for use by xslt_proc.c */
  
! void		pgxml_parser_init(PgXmlStrictnessType strictness);
  
  /* workspace for pgxml_xpath() */
  
*************** static void cleanup_workspace(xpath_work
*** 70,79 ****
   * Initialize for xml parsing.
   */
  void
! pgxml_parser_init(void)
  {
  	/* Set up error handling (we share the core's error handler) */
! 	pg_xml_init();
  
  	/* Initialize libxml */
  	xmlInitParser();
--- 70,79 ----
   * Initialize for xml parsing.
   */
  void
! pgxml_parser_init(PgXmlStrictnessType strictness)
  {
  	/* Set up error handling (we share the core's error handler) */
! 	pg_xml_init_errorhandling(strictness);
  
  	/* Initialize libxml */
  	xmlInitParser();
*************** xml_is_well_formed(PG_FUNCTION_ARGS)
*** 100,113 ****
  	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
  	int32		docsize = VARSIZE(t) - VARHDRSZ;
  	xmlDocPtr	doctree;
  
! 	pgxml_parser_init();
  
  	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
! 	if (doctree == NULL)
! 		PG_RETURN_BOOL(false);	/* i.e. not well-formed */
  	xmlFreeDoc(doctree);
! 	PG_RETURN_BOOL(true);
  }
  
  
--- 100,116 ----
  	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
  	int32		docsize = VARSIZE(t) - VARHDRSZ;
  	xmlDocPtr	doctree;
+ 	bool		result;
  
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
! 	result = (doctree != NULL);
  	xmlFreeDoc(doctree);
! 
! 	pg_xml_done();
! 
! 	PG_RETURN_BOOL(result);
  }
  
  
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 406,416 ****
  	workspace->ctxt = NULL;
  	workspace->res = NULL;
  
! 	pgxml_parser_init();
  
  	workspace->doctree = xmlParseMemory((char *) VARDATA(document), docsize);
! 	if (workspace->doctree == NULL)
  		return NULL;			/* not well-formed */
  
  	workspace->ctxt = xmlXPathNewContext(workspace->doctree);
  	workspace->ctxt->node = xmlDocGetRootElement(workspace->doctree);
--- 409,421 ----
  	workspace->ctxt = NULL;
  	workspace->res = NULL;
  
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	workspace->doctree = xmlParseMemory((char *) VARDATA(document), docsize);
! 	if (workspace->doctree == NULL) {
! 		pg_xml_done();
  		return NULL;			/* not well-formed */
+ 	}
  
  	workspace->ctxt = xmlXPathNewContext(workspace->doctree);
  	workspace->ctxt->node = xmlDocGetRootElement(workspace->doctree);
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 420,425 ****
--- 425,431 ----
  	if (comppath == NULL)
  	{
  		cleanup_workspace(workspace);
+ 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"XPath Syntax Error");
  	}
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 433,438 ****
--- 439,446 ----
  	if (res == NULL)
  		cleanup_workspace(workspace);
  
+ 	pg_xml_done();
+ 
  	return res;
  }
  
*************** xpath_table(PG_FUNCTION_ARGS)
*** 659,665 ****
  	 * Setup the parser.  This should happen after we are done evaluating the
  	 * query, in case it calls functions that set up libxml differently.
  	 */
! 	pgxml_parser_init();
  
  	/* For each row i.e. document returned from SPI */
  	for (i = 0; i < proc; i++)
--- 667,673 ----
  	 * Setup the parser.  This should happen after we are done evaluating the
  	 * query, in case it calls functions that set up libxml differently.
  	 */
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	/* For each row i.e. document returned from SPI */
  	for (i = 0; i < proc; i++)
*************** xpath_table(PG_FUNCTION_ARGS)
*** 720,725 ****
--- 728,734 ----
  					if (comppath == NULL)
  					{
  						xmlFreeDoc(doctree);
+ 						pg_xml_done();
  						xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  									"XPath Syntax Error");
  					}
*************** xpath_table(PG_FUNCTION_ARGS)
*** 784,789 ****
--- 793,800 ----
  			pfree(xmldoc);
  	}
  
+ 	pg_xml_done();
+ 
  	tuplestore_donestoring(tupstore);
  
  	SPI_finish();
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index f8f7d72..0ebca21 100644
*** a/contrib/xml2/xslt_proc.c
--- b/contrib/xml2/xslt_proc.c
*************** Datum		xslt_process(PG_FUNCTION_ARGS);
*** 38,44 ****
  #ifdef USE_LIBXSLT
  
  /* declarations to come from xpath.c */
! extern void pgxml_parser_init(void);
  
  /* local defs */
  static const char **parse_params(text *paramstr);
--- 38,44 ----
  #ifdef USE_LIBXSLT
  
  /* declarations to come from xpath.c */
! extern void pgxml_parser_init(PgXmlStrictnessType strictness);
  
  /* local defs */
  static const char **parse_params(text *paramstr);
*************** xslt_process(PG_FUNCTION_ARGS)
*** 77,83 ****
  	}
  
  	/* Setup parser */
! 	pgxml_parser_init();
  
  	/* Check to see if document is a file or a literal */
  
--- 77,83 ----
  	}
  
  	/* Setup parser */
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	/* Check to see if document is a file or a literal */
  
*************** xslt_process(PG_FUNCTION_ARGS)
*** 86,94 ****
  	else
  		doctree = xmlParseFile(text_to_cstring(doct));
  
! 	if (doctree == NULL)
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"error parsing XML document");
  
  	/* Same for stylesheet */
  	if (VARDATA(ssheet)[0] == '<')
--- 86,96 ----
  	else
  		doctree = xmlParseFile(text_to_cstring(doct));
  
! 	if (doctree == NULL) {
! 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"error parsing XML document");
+ 	}
  
  	/* Same for stylesheet */
  	if (VARDATA(ssheet)[0] == '<')
*************** xslt_process(PG_FUNCTION_ARGS)
*** 98,103 ****
--- 100,106 ----
  		if (ssdoc == NULL)
  		{
  			xmlFreeDoc(doctree);
+ 			pg_xml_done();
  			xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  						"error parsing stylesheet as XML document");
  		}
*************** xslt_process(PG_FUNCTION_ARGS)
*** 112,117 ****
--- 115,121 ----
  	{
  		xmlFreeDoc(doctree);
  		xsltCleanupGlobals();
+ 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"failed to parse stylesheet");
  	}
*************** xslt_process(PG_FUNCTION_ARGS)
*** 125,130 ****
--- 129,136 ----
  
  	xsltCleanupGlobals();
  
+ 	pg_xml_done();
+ 
  	if (resstat < 0)
  		PG_RETURN_NULL();
  
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 702b9e3..a09c13e 100644
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
*************** int			xmloption;
*** 84,92 ****
  
  #ifdef USE_LIBXML
  
  static StringInfo xml_err_buf = NULL;
  
! static void xml_errorHandler(void *ctxt, const char *msg,...);
  static void xml_ereport_by_code(int level, int sqlcode,
  					const char *msg, int errcode);
  
--- 84,107 ----
  
  #ifdef USE_LIBXML
  
+ #define XML_CHECK_AND_EREPORT(assertion,level,sqlcode,msg) \
+ 	if (pg_xml_erroroccurred() || !(assertion)) { \
+ 		xml_ereport(level, sqlcode, msg); \
+ 	} \
+ 	else { \
+ 	}
+ 
+ static bool xml_error_initialized = false;
+ static xmlStructuredErrorFunc xml_structuredErrorFunc_saved = NULL;
+ static void* xml_structuredErrorContext_saved = NULL;
+ 
+ static PgXmlStrictnessType xml_strictness = PG_XML_STRICTNESS_NONE;
+ static bool xml_err_occurred;
  static StringInfo xml_err_buf = NULL;
  
! static void xml_errorHandler(void* data, xmlErrorPtr error);
! static void appendStringInfoLineSeparator(StringInfo str);
! static void chopStringInfoNewlines(StringInfo str);
  static void xml_ereport_by_code(int level, int sqlcode,
  					const char *msg, int errcode);
  
*************** xmlelement(XmlExprState *xmlExpr, ExprCo
*** 597,614 ****
  	}
  
  	/* now safe to run libxml */
! 	pg_xml_init();
  
  	PG_TRY();
  	{
  		buf = xmlBufferCreate();
! 		if (!buf)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate xmlBuffer");
  		writer = xmlNewTextWriterMemory(buf, 0);
! 		if (!writer)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate xmlTextWriter");
  
  		xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
  
--- 612,627 ----
  	}
  
  	/* now safe to run libxml */
! 	pg_xml_init(PG_XML_STRICTNESS_ALL);
  
  	PG_TRY();
  	{
  		buf = xmlBufferCreate();
! 		XML_CHECK_AND_EREPORT(buf != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							 "could not allocate xmlBuffer");
  		writer = xmlNewTextWriterMemory(buf, 0);
! 		XML_CHECK_AND_EREPORT(writer != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							  "could not allocate xmlTextWriter");
  
  		xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
  
*************** xmlelement(XmlExprState *xmlExpr, ExprCo
*** 644,655 ****
--- 657,673 ----
  			xmlFreeTextWriter(writer);
  		if (buf)
  			xmlBufferFree(buf);
+ 
+ 		pg_xml_done();
+ 
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlBufferFree(buf);
  
+ 	pg_xml_done();
+ 
  	return result;
  #else
  	NO_XML_SUPPORT();
*************** xml_is_document(xmltype *arg)
*** 843,865 ****
  #ifdef USE_LIBXML
  
  /*
!  * pg_xml_init --- set up for use of libxml
   *
   * This should be called by each function that is about to use libxml
!  * facilities.	It has two responsibilities: verify compatibility with the
!  * loaded libxml version (done on first call in a session) and establish
!  * or re-establish our libxml error handler.  The latter needs to be done
!  * anytime we might have passed control to add-on modules (eg libperl) which
!  * might have set their own error handler for libxml.
!  *
!  * This is exported for use by contrib/xml2, as well as other code that might
!  * wish to share use of this module's libxml error handler.
!  *
!  * TODO: xmlChar is utf8-char, make proper tuning (initdb with enc!=utf8 and
!  * check)
   */
  void
! pg_xml_init(void)
  {
  	static bool first_time = true;
  
--- 861,875 ----
  #ifdef USE_LIBXML
  
  /*
!  * pg_xml_init_library --- setup for use of libxml
   *
   * This should be called by each function that is about to use libxml
!  * facilities but doesn't require error handling. It initializes libxml
!  * and verifies compatibility with the loaded libxml version (done on first
!  * call in a session).
   */
  void
! pg_xml_init_library(void)
  {
  	static bool first_time = true;
  
*************** pg_xml_init(void)
*** 882,890 ****
  		oldcontext = MemoryContextSwitchTo(TopMemoryContext);
  		xml_err_buf = makeStringInfo();
  		MemoryContextSwitchTo(oldcontext);
! 
! 		/* Now that xml_err_buf exists, safe to call xml_errorHandler */
! 		xmlSetGenericErrorFunc(NULL, xml_errorHandler);
  
  #ifdef USE_LIBXMLCONTEXT
  		/* Set up memory allocation our way, too */
--- 892,898 ----
  		oldcontext = MemoryContextSwitchTo(TopMemoryContext);
  		xml_err_buf = makeStringInfo();
  		MemoryContextSwitchTo(oldcontext);
! 		xml_err_occurred = false;
  
  #ifdef USE_LIBXMLCONTEXT
  		/* Set up memory allocation our way, too */
*************** pg_xml_init(void)
*** 896,916 ****
  
  		first_time = false;
  	}
! 	else
! 	{
! 		/* Reset pre-existing buffer to empty */
! 		Assert(xml_err_buf != NULL);
! 		resetStringInfo(xml_err_buf);
  
! 		/*
! 		 * We re-establish the error callback function every time.	This makes
! 		 * it safe for other subsystems (PL/Perl, say) to also use libxml with
! 		 * their own callbacks ... so long as they likewise set up the
! 		 * callbacks on every use. It's cheap enough to not be worth worrying
! 		 * about, anyway.
! 		 */
! 		xmlSetGenericErrorFunc(NULL, xml_errorHandler);
  	}
  }
  
  
--- 904,1008 ----
  
  		first_time = false;
  	}
! }
  
! /*
!  * pg_xml_init --- set up for use of libxml and register an error handler
!  *
!  * This should be called by each function that is about to use libxml
!  * facilities and requires error handling. It initialized libxml with
!  * pg_xml_init_library() and establishes our libxml error handler.
!  * The strictness determines which errors are reported and which are ignored
!  *
!  * pg_xml_done() *must* be called after the caller is done using libxml
!  * to restore the original error handler.
!  *
!  * This is exported for use by contrib/xml2, as well as other code that might
!  * wish to share use of this module's libxml error handler.
!  *
!  * TODO: xmlChar is utf8-char, make proper tuning (initdb with enc!=utf8 and
!  * check)
!  */
! void
! pg_xml_init(PgXmlStrictnessType strictness)
! {
! 	pg_xml_init_library();
! 
! 	Assert(strictness != PG_XML_STRICTNESS_NONE);
! 
! 	/* Imbalanced calls of pg_xml_init() and pg_xml_done() */
! 	Assert(!xml_error_initialized);
! 	Assert(xml_strictness == PG_XML_STRICTNESS_NONE);
! 
! 	/* Set strictness level */
! 	xml_strictness = strictness;
! 
! 	/* Reset pre-existing buffer to empty */
! 	Assert(xml_err_buf != NULL);
! 	resetStringInfo(xml_err_buf);
! 	xml_err_occurred = false;
! 
! 	/* Save original error handler and install ours. libxml originally didn't
! 	 * distinguish between the context for generic and for structured error
! 	 * handler. If we're using such an libxml version, we must thus save
! 	 * the generic error context, even though we're using a structured
! 	 * error handler.
! 	 */
! 	xml_structuredErrorFunc_saved = xmlStructuredError;
! #if HAVE_LIBXML2_XMLSTRUCTUREDERRORCONTEXT
! 	xml_structuredErrorContext_saved = xmlStructuredErrorContext;
! #else
! 	xml_structuredErrorContext_saved = xmlGenericErrorContext;
! #endif
! 	xmlSetStructuredErrorFunc(NULL, xml_errorHandler);
! 
! 	xml_error_initialized = true;
! }
! 
! 
! /*
!  * pg_xml_done --- restore libxml state after pg_xml_init().
!  *
!  * This must be called if pg_xml_init() was called with a
!  * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
!  * (i.e. the structured error handler) to the original state.
!  *
!  * It is OK to call xml_ereport() after pg_xml_done() -
!  * pg_xml_done() leaves xml_err_buf as it is.
!  */
! void
! pg_xml_done(void)
! {
! 	/* Imbalanced calls of pg_xml_init() and pg_xml_done(). */
! 	Assert(xml_error_initialized);
! 	Assert(xml_strictness != PG_XML_STRICTNESS_NONE);
! 
! 	/* Errors should have been dealt with */
! 	Assert(!xml_err_occurred);
! 
! 	xmlSetStructuredErrorFunc(xml_structuredErrorContext_saved,
! 							  xml_structuredErrorFunc_saved);
! 	xml_strictness = PG_XML_STRICTNESS_NONE;
! 	xml_error_initialized = false;
! }
! 
! 
! 
! /*
!  * pg_xml_erroroccurred() --- Test and reset the error flag.
!  *
!  * Returns true if an libxml error occurred after the last call of
!  * this function.
!  */
! bool
! pg_xml_erroroccurred(void)
! {
! 	if (xml_err_occurred) {
! 		xml_err_occurred = false;
! 		return true;
  	}
+ 
+ 	return false;
  }
  
  
*************** parse_xml_decl(const xmlChar *str, size_
*** 969,975 ****
  	int			utf8char;
  	int			utf8len;
  
! 	pg_xml_init();
  
  	/* Initialize output arguments to "not present" */
  	if (version)
--- 1061,1071 ----
  	int			utf8char;
  	int			utf8len;
  
! 	/* Only initialize libxml. We don't need error handling here,
! 	 * but we do need to make sure libxml is initialized before
! 	 * calling any of its functions
! 	 */
! 	pg_xml_init_library();
  
  	/* Initialize output arguments to "not present" */
  	if (version)
*************** static bool
*** 1123,1130 ****
  print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone)
  {
- 	pg_xml_init();				/* why is this here? */
- 
  	if ((version && strcmp((char *) version, PG_XML_DEFAULT_VERSION) != 0)
  		|| (encoding && encoding != PG_UTF8)
  		|| standalone != -1)
--- 1219,1224 ----
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1175,1182 ****
  	int32		len;
  	xmlChar    *string;
  	xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt;
! 	xmlDocPtr	doc;
  
  	len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
  	string = xml_text2xmlChar(data);
--- 1269,1276 ----
  	int32		len;
  	xmlChar    *string;
  	xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt = NULL;
! 	xmlDocPtr	doc = NULL;
  
  	len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
  	string = xml_text2xmlChar(data);
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1187,1203 ****
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
! 	pg_xml_init();
  	xmlInitParser();
  
- 	ctxt = xmlNewParserCtxt();
- 	if (ctxt == NULL)
- 		xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
- 					"could not allocate parser context");
- 
  	/* Use a TRY block to ensure the ctxt is released */
  	PG_TRY();
  	{
  		if (xmloption_arg == XMLOPTION_DOCUMENT)
  		{
  			/*
--- 1281,1296 ----
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
! 	pg_xml_init(PG_XML_STRICTNESS_WELLFORMED);
  	xmlInitParser();
  
  	/* Use a TRY block to ensure the ctxt is released */
  	PG_TRY();
  	{
+ 		ctxt = xmlNewParserCtxt();
+ 		XML_CHECK_AND_EREPORT(ctxt != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 							  "could not allocate parser context");
+ 
  		if (xmloption_arg == XMLOPTION_DOCUMENT)
  		{
  			/*
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1212,1220 ****
  								 "UTF-8",
  								 XML_PARSE_NOENT | XML_PARSE_DTDATTR
  						   | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
! 			if (doc == NULL)
! 				xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 							"invalid XML document");
  		}
  		else
  		{
--- 1305,1312 ----
  								 "UTF-8",
  								 XML_PARSE_NOENT | XML_PARSE_DTDATTR
  						   | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
! 			XML_CHECK_AND_EREPORT(doc != NULL, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 								  "invalid XML document");
  		}
  		else
  		{
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1237,1259 ****
  
  			res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
  												   utf8string + count, NULL);
! 			if (res_code != 0)
! 			{
! 				xmlFreeDoc(doc);
! 				xml_ereport(ERROR, ERRCODE_INVALID_XML_CONTENT,
! 							"invalid XML content");
! 			}
  		}
  	}
  	PG_CATCH();
  	{
! 		xmlFreeParserCtxt(ctxt);
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlFreeParserCtxt(ctxt);
  
  	return doc;
  }
  
--- 1329,1355 ----
  
  			res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
  												   utf8string + count, NULL);
! 			XML_CHECK_AND_EREPORT(res_code == 0, ERROR, ERRCODE_INVALID_XML_CONTENT,
! 								  "invalid XML content");
  		}
  	}
  	PG_CATCH();
  	{
! 		if (doc != NULL)
! 			xmlFreeDoc(doc);
! 		if (ctxt != NULL)
! 			xmlFreeParserCtxt(ctxt);
! 
! 		pg_xml_done();
! 
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlFreeParserCtxt(ctxt);
  
+ 	pg_xml_done();
+ 
  	return doc;
  }
  
*************** xml_ereport(int level, int sqlcode, cons
*** 1354,1366 ****
  
  	if (detail)
  	{
- 		size_t		len;
- 
- 		/* libxml error messages end in '\n'; get rid of it */
- 		len = strlen(detail);
- 		if (len > 0 && detail[len - 1] == '\n')
- 			detail[len - 1] = '\0';
- 
  		ereport(level,
  				(errcode(sqlcode),
  				 errmsg("%s", msg),
--- 1450,1455 ----
*************** xml_ereport(int level, int sqlcode, cons
*** 1376,1402 ****
  
  
  /*
!  * Error handler for libxml error messages
   */
  static void
! xml_errorHandler(void *ctxt, const char *msg,...)
  {
! 	/* Append the formatted text to xml_err_buf */
! 	for (;;)
  	{
! 		va_list		args;
! 		bool		success;
  
- 		/* Try to format the data. */
- 		va_start(args, msg);
- 		success = appendStringInfoVA(xml_err_buf, msg, args);
- 		va_end(args);
  
! 		if (success)
  			break;
  
! 		/* Double the buffer size and try again. */
! 		enlargeStringInfo(xml_err_buf, xml_err_buf->maxlen);
  	}
  }
  
--- 1465,1623 ----
  
  
  /*
!  * Append a newline after removing all existing trailing newlines
   */
  static void
! appendStringInfoLineSeparator(StringInfo str)
  {
! 	chopStringInfoNewlines(str);
! 
! 	if (str->len > 0)
! 		appendStringInfoChar(str, '\n');
! }
! 
! 
! /*
!  * Remove all trailing newlines
!  */
! static void
! chopStringInfoNewlines(StringInfo str)
! {
! 	while ((str->len > 0) &&
! 		   (str->data[str->len-1] == '\n'))
  	{
! 		str->data[--str->len] = '\0';
! 	}
! }
  
  
! /*
!  * Error handler for libxml errors and warnings
!  */
! static void
! xml_errorHandler(void* data, xmlErrorPtr error)
! {
! 	int			domain;
! 	int			level;
! 	StringInfo errorBuf = makeStringInfo();
! 
! 	/* Get parser and input context */
! 	xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr) error->ctxt;
! 	xmlParserInputPtr input = (ctxt != NULL) ? ctxt->input : NULL;
! 	xmlNodePtr node = error->node;
! 	const xmlChar* name = ((node != NULL) && (node->type == XML_ELEMENT_NODE)) ?
! 					node->name : NULL;
! 
! 	/* Older libxml versions report some errors differently.
! 	 * First, some errors were previously reported as
! 	 * coming from the parser domain but are now reported as
! 	 * coming from the namespace domain.
! 	 * Second, some warnings were upgraded to errors.
! 	 * We attempt to compensate for that here
! 	 */
! 	domain = error->domain;
! 	level = error->level;
! 	switch (error->code) {
! 		case XML_WAR_NS_URI:
! 			level = XML_ERR_ERROR;
! 			domain = XML_FROM_NAMESPACE;
  			break;
  
! 		case XML_ERR_NS_DECL_ERROR:
! 		case XML_WAR_NS_URI_RELATIVE:
! 		case XML_WAR_NS_COLUMN:
! 		case XML_NS_ERR_XML_NAMESPACE:
! 		case XML_NS_ERR_UNDEFINED_NAMESPACE:
! 		case XML_NS_ERR_QNAME:
! 		case XML_NS_ERR_ATTRIBUTE_REDEFINED:
! 		case XML_NS_ERR_EMPTY:
! 			domain = XML_FROM_NAMESPACE;
! 			break;
! 	}
! 
! 	/* Decide whether to act on the error or not */
! 	switch (domain) {
! 		case XML_FROM_PARSER:
! 		case XML_FROM_NONE:
! 		case XML_FROM_MEMORY:
! 		case XML_FROM_IO:
! 			/* Accept regardless of the parsing purpose */
! 			break;
! 
! 		default:
! 			/* Ignore during well-formedness check */
! 			if (xml_strictness == PG_XML_STRICTNESS_WELLFORMED)
! 				return;
! 			break;
! 	}
! 
! 	/* Append error message to xml_err_buf */
! 	if (error->line > 0)
! 		appendStringInfo(errorBuf, "line %d: ", error->line);
! 	if (name != NULL)
! 		appendStringInfo(errorBuf, "element %s: ", name);
! 	appendStringInfo(errorBuf, "%s", error->message);
! 
! 	/* Append context information to xml_err_buf.
! 	 * xmlParserPrintFileContext() uses the *generic* error handle to
! 	 * write the context. Since we don't want to duplicate libxml
! 	 * functionality here, we setup a generic error handler temporarily
! 	 */
! 	if (input != NULL)
! 	{
! 		/* Save generic error func and setup the xml_err_buf
! 		 * appender as generic error func. This should work because
! 		 * appendStringInfo() has essentially the same signature as
! 		 * xmlGenericErrorFunc().
! 		 */
! 		xmlGenericErrorFunc errFuncSaved = xmlGenericError;
! 		void* errCtxSaved = xmlGenericErrorContext;
! 		xmlSetGenericErrorFunc(errorBuf, (xmlGenericErrorFunc)appendStringInfo);
! 
! 		/* Generic context information */
! 		appendStringInfoLineSeparator(errorBuf);
! 		xmlParserPrintFileContext(input);
! 
! 		/* Restore generic error func */
! 		xmlSetGenericErrorFunc(errCtxSaved, errFuncSaved);
! 	}
! 
! 	chopStringInfoNewlines(errorBuf);
! 
! 	/* Legacy error handling. The error flag is never set. Exists because
! 	 * the xml2 contrib module uses our error-handling infrastructure, but
! 	 * we don't want to change its behaviour since it's deprecated anyway.
! 	 * This is also why we don't distinguish between notices, warnings
! 	 * and errors here - the old-style generic error handler wouldn't
! 	 * have done that either.
! 	 */
! 	if (xml_strictness == PG_XML_STRICTNESS_LEGACY)
! 	{
! 		appendStringInfoLineSeparator(xml_err_buf);
! 		appendStringInfoString(xml_err_buf, errorBuf->data);
! 		return;
! 	}
! 
! 	/* We don't want to ereport() here because that'd probably leave
! 	 * libxml in an inconsistent state. Instead, we remember the
! 	 * error and ereport() from xml_ereport().
! 	 *
! 	 * Warnings and notices are reported immediatly since they don't cause a
! 	 * longjmp() out of libxml.
! 	 */
! 	if (level >= XML_ERR_ERROR)
! 	{
! 		appendStringInfoLineSeparator(xml_err_buf);
! 		appendStringInfoString(xml_err_buf, errorBuf->data);
! 		xml_err_occurred = true;
! 	}
! 	else if (level >= XML_ERR_WARNING)
! 	{
! 		ereport(WARNING, (errmsg("%s", errorBuf->data)));
! 	}
! 	else
! 	{
! 		ereport(NOTICE, (errmsg("%s", errorBuf->data)));
  	}
  }
  
*************** map_sql_value_to_xml_value(Datum value, 
*** 1756,1773 ****
  					xmlTextWriterPtr writer = NULL;
  					char	   *result;
  
! 					pg_xml_init();
  
  					PG_TRY();
  					{
  						buf = xmlBufferCreate();
! 						if (!buf)
! 							xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 										"could not allocate xmlBuffer");
  						writer = xmlNewTextWriterMemory(buf, 0);
! 						if (!writer)
! 							xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 										"could not allocate xmlTextWriter");
  
  						if (xmlbinary == XMLBINARY_BASE64)
  							xmlTextWriterWriteBase64(writer, VARDATA_ANY(bstr),
--- 1977,1992 ----
  					xmlTextWriterPtr writer = NULL;
  					char	   *result;
  
! 					pg_xml_init(PG_XML_STRICTNESS_ALL);
  
  					PG_TRY();
  					{
  						buf = xmlBufferCreate();
! 						XML_CHECK_AND_EREPORT(buf != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 											  "could not allocate xmlBuffer");
  						writer = xmlNewTextWriterMemory(buf, 0);
! 						XML_CHECK_AND_EREPORT(writer != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 											  "could not allocate xmlTextWriter");
  
  						if (xmlbinary == XMLBINARY_BASE64)
  							xmlTextWriterWriteBase64(writer, VARDATA_ANY(bstr),
*************** map_sql_value_to_xml_value(Datum value, 
*** 1788,1799 ****
--- 2007,2023 ----
  							xmlFreeTextWriter(writer);
  						if (buf)
  							xmlBufferFree(buf);
+ 
+ 						pg_xml_done();
+ 
  						PG_RE_THROW();
  					}
  					PG_END_TRY();
  
  					xmlBufferFree(buf);
  
+ 					pg_xml_done();
+ 
  					return result;
  				}
  #endif   /* USE_LIBXML */
*************** xpath_internal(text *xpath_expr_text, xm
*** 3381,3387 ****
  	memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
  	xpath_expr[xpath_len] = '\0';
  
! 	pg_xml_init();
  	xmlInitParser();
  
  	PG_TRY();
--- 3605,3611 ----
  	memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
  	xpath_expr[xpath_len] = '\0';
  
! 	pg_xml_init(PG_XML_STRICTNESS_ALL);
  	xmlInitParser();
  
  	PG_TRY();
*************** xpath_internal(text *xpath_expr_text, xm
*** 3391,3411 ****
  		 * command execution are possible)
  		 */
  		ctxt = xmlNewParserCtxt();
! 		if (ctxt == NULL)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate parser context");
  		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 		if (doc == NULL)
! 			xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 						"could not parse XML document");
  		xpathctx = xmlXPathNewContext(doc);
! 		if (xpathctx == NULL)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate XPath context");
  		xpathctx->node = xmlDocGetRootElement(doc);
! 		if (xpathctx->node == NULL)
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"could not find root XML element");
  
  		/* register namespaces, if any */
  		if (ns_count > 0)
--- 3615,3631 ----
  		 * command execution are possible)
  		 */
  		ctxt = xmlNewParserCtxt();
! 		XML_CHECK_AND_EREPORT(ctxt != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							  "could not allocate parser context");
  		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 		XML_CHECK_AND_EREPORT(doc != NULL, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 							  "could not parse XML document");
  		xpathctx = xmlXPathNewContext(doc);
! 		XML_CHECK_AND_EREPORT(xpathctx != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							  "could not allocate XPath context");
  		xpathctx->node = xmlDocGetRootElement(doc);
! 		XML_CHECK_AND_EREPORT(xpathctx->node != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 							  "could not find root XML element");
  
  		/* register namespaces, if any */
  		if (ns_count > 0)
*************** xpath_internal(text *xpath_expr_text, xm
*** 3432,3440 ****
  		}
  
  		xpathcomp = xmlXPathCompile(xpath_expr);
! 		if (xpathcomp == NULL)	/* TODO: show proper XPath error details */
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"invalid XPath expression");
  
  		/*
  		 * Version 2.6.27 introduces a function named
--- 3652,3659 ----
  		}
  
  		xpathcomp = xmlXPathCompile(xpath_expr);
! 		XML_CHECK_AND_EREPORT(xpathcomp != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 							  "invalid XPath expression");
  
  		/*
  		 * Version 2.6.27 introduces a function named
*************** xpath_internal(text *xpath_expr_text, xm
*** 3444,3452 ****
  		 * the same.
  		 */
  		xpathobj = xmlXPathCompiledEval(xpathcomp, xpathctx);
! 		if (xpathobj == NULL)	/* TODO: reason? */
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"could not create XPath object");
  
  		/* return empty array in cases when nothing is found */
  		if (xpathobj->nodesetval == NULL)
--- 3663,3670 ----
  		 * the same.
  		 */
  		xpathobj = xmlXPathCompiledEval(xpathcomp, xpathctx);
! 		XML_CHECK_AND_EREPORT(xpathobj != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 							  "could not create XPath object");
  
  		/* return empty array in cases when nothing is found */
  		if (xpathobj->nodesetval == NULL)
*************** xpath_internal(text *xpath_expr_text, xm
*** 3481,3486 ****
--- 3699,3707 ----
  			xmlFreeDoc(doc);
  		if (ctxt)
  			xmlFreeParserCtxt(ctxt);
+ 
+ 		pg_xml_done();
+ 
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
*************** xpath_internal(text *xpath_expr_text, xm
*** 3490,3495 ****
--- 3711,3718 ----
  	xmlXPathFreeContext(xpathctx);
  	xmlFreeDoc(doc);
  	xmlFreeParserCtxt(ctxt);
+ 
+ 	pg_xml_done();
  }
  #endif   /* USE_LIBXML */
  
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 6359cd6..bc4a74d 100644
*** a/src/include/utils/xml.h
--- b/src/include/utils/xml.h
*************** typedef enum
*** 68,74 ****
  	XML_STANDALONE_OMITTED
  }	XmlStandaloneType;
  
! extern void pg_xml_init(void);
  extern void xml_ereport(int level, int sqlcode, const char *msg);
  extern xmltype *xmlconcat(List *args);
  extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
--- 68,87 ----
  	XML_STANDALONE_OMITTED
  }	XmlStandaloneType;
  
! typedef enum
! {
! 	PG_XML_STRICTNESS_NONE		/* No error handling */,
! 	PG_XML_STRICTNESS_LEGACY	/* Ignore notices/warnings/errors unless
! 								 * function result indicates error condition
! 								 */,
! 	PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser notices/warnings/errors */,
! 	PG_XML_STRICTNESS_ALL		/* Report all notices/warnings/errors */,
! } PgXmlStrictnessType;
! 
! extern void pg_xml_init_library(void);
! extern void pg_xml_init(PgXmlStrictnessType strictness);
! extern void pg_xml_done(void);
! extern bool pg_xml_erroroccurred(void);
  extern void xml_ereport(int level, int sqlcode, const char *msg);
  extern xmltype *xmlconcat(List *args);
  extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index eaa5a74..364ce51 100644
*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
  <wrong
        ^
  SELECT * FROM xmltest;
--- 8,14 ----
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
  <wrong
        ^
  SELECT * FROM xmltest;
*************** SELECT xmlconcat('bad', '<syntax');
*** 62,68 ****
  ERROR:  invalid XML content
  LINE 1: SELECT xmlconcat('bad', '<syntax');
                                  ^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag syntax line 1
  <syntax
         ^
  SELECT xmlconcat('<foo/>', NULL, '<?xml version="1.1" standalone="no"?><bar/>');
--- 62,68 ----
  ERROR:  invalid XML content
  LINE 1: SELECT xmlconcat('bad', '<syntax');
                                  ^
! DETAIL:  line 1: Couldn't find end of Start Tag syntax line 1
  <syntax
         ^
  SELECT xmlconcat('<foo/>', NULL, '<?xml version="1.1" standalone="no"?><bar/>');
*************** SELECT xmlparse(content '<abc>x</abc>');
*** 206,214 ****
   <abc>x</abc>
  (1 row)
  
  SELECT xmlparse(document 'abc');
  ERROR:  invalid XML document
! DETAIL:  Entity: line 1: parser error : Start tag expected, '<' not found
  abc
  ^
  SELECT xmlparse(document '<abc>x</abc>');
--- 206,259 ----
   <abc>x</abc>
  (1 row)
  
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: xmlParseEntityRef: no name
+ <invalidentity>&</invalidentity>
+                 ^
+ line 1: chunk is not well balanced
+ <invalidentity>&</invalidentity>
+                                 ^
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <undefinedentity>&idontexist;</undefinedentity>
+                              ^
+ line 1: chunk is not well balanced
+ <undefinedentity>&idontexist;</undefinedentity>
+                                                ^
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+          xmlparse          
+ ---------------------------
+  <invalidns xmlns='&lt;'/>
+ (1 row)
+ 
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+             xmlparse            
+ --------------------------------
+  <relativens xmlns='relative'/>
+ (1 row)
+ 
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <twoerrors>&idontexist;</unbalanced>
+                        ^
+ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ line 1: chunk is not well balanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
+       xmlparse       
+ ---------------------
+  <nosuchprefix:tag/>
+ (1 row)
+ 
  SELECT xmlparse(document 'abc');
  ERROR:  invalid XML document
! DETAIL:  line 1: Start tag expected, '<' not found
  abc
  ^
  SELECT xmlparse(document '<abc>x</abc>');
*************** SELECT xmlparse(document '<abc>x</abc>')
*** 217,222 ****
--- 262,309 ----
   <abc>x</abc>
  (1 row)
  
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: xmlParseEntityRef: no name
+ <invalidentity>&</abc>
+                 ^
+ line 1: Opening and ending tag mismatch: invalidentity line 1 and abc
+ <invalidentity>&</abc>
+                       ^
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <undefinedentity>&idontexist;</abc>
+                              ^
+ line 1: Opening and ending tag mismatch: undefinedentity line 1 and abc
+ <undefinedentity>&idontexist;</abc>
+                                    ^
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+          xmlparse          
+ ---------------------------
+  <invalidns xmlns='&lt;'/>
+ (1 row)
+ 
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+             xmlparse            
+ --------------------------------
+  <relativens xmlns='relative'/>
+ (1 row)
+ 
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <twoerrors>&idontexist;</unbalanced>
+                        ^
+ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
+       xmlparse       
+ ---------------------
+  <nosuchprefix:tag/>
+ (1 row)
+ 
  SELECT xmlpi(name foo);
    xmlpi  
  ---------
*************** SELECT '<>' IS NOT DOCUMENT;
*** 379,385 ****
  ERROR:  invalid XML content
  LINE 1: SELECT '<>' IS NOT DOCUMENT;
                 ^
! DETAIL:  Entity: line 1: parser error : StartTag: invalid element name
  <>
   ^
  SELECT xmlagg(data) FROM xmltest;
--- 466,472 ----
  ERROR:  invalid XML content
  LINE 1: SELECT '<>' IS NOT DOCUMENT;
                 ^
! DETAIL:  line 1: StartTag: invalid element name
  <>
   ^
  SELECT xmlagg(data) FROM xmltest;
*************** EXECUTE foo ('bad');
*** 425,431 ****
  ERROR:  invalid XML document
  LINE 1: EXECUTE foo ('bad');
                       ^
! DETAIL:  Entity: line 1: parser error : Start tag expected, '<' not found
  bad
  ^
  SET XML OPTION CONTENT;
--- 512,518 ----
  ERROR:  invalid XML document
  LINE 1: EXECUTE foo ('bad');
                       ^
! DETAIL:  line 1: Start tag expected, '<' not found
  bad
  ^
  SET XML OPTION CONTENT;
*************** SELECT xml_is_well_formed('<pg:foo xmlns
*** 679,684 ****
--- 766,801 ----
   t
  (1 row)
  
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+  xml_is_well_formed 
+ --------------------
+  t
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+  xml_is_well_formed 
+ --------------------
+  t
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
   xml_is_well_formed 
*************** SELECT xml_is_well_formed('abc');
*** 686,688 ****
--- 803,838 ----
   t
  (1 row)
  
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ -- Since different libxml versions emit slightly different
+ -- error messages, we suppress them
+ \set VERBOSITY terse
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ ERROR:  could not parse XML document
+ \set VERBOSITY default
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ ERROR:  could not parse XML document
+ DETAIL:  line 1: Namespace prefix nosuchprefix on tag is not defined
+ <nosuchprefix:tag/>
+                  ^
+ CONTEXT:  SQL function "xpath" statement 1
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+ WARNING:  line 1: xmlns: URI relative is not absolute
+ <relativens xmlns='relative'/>
+                             ^
+ CONTEXT:  SQL function "xpath" statement 1
+                 xpath                 
+ --------------------------------------
+  {"<relativens xmlns=\"relative\"/>"}
+ (1 row)
+ 
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 711b435..75e78a0 100644
*** a/src/test/regress/expected/xml_1.out
--- b/src/test/regress/expected/xml_1.out
*************** SELECT xmlparse(content '<abc>x</abc>');
*** 172,177 ****
--- 172,201 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SELECT xmlparse(document 'abc');
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
*************** SELECT xmlparse(document '<abc>x</abc>')
*** 180,185 ****
--- 204,233 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SELECT xmlpi(name foo);
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
*************** SELECT xml_is_well_formed('<pg:foo xmlns
*** 627,634 ****
--- 675,731 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ -- Since different libxml versions emit slightly different
+ -- error messages, we suppress them
+ \set VERBOSITY terse
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature at character 20
+ \set VERBOSITY default
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ LINE 1: SELECT xpath('/*', '<nosuchprefix:tag/>');
+                            ^
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ LINE 1: SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+                            ^
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 717a1e7..b1a3a2c 100644
*** a/src/test/regress/sql/xml.sql
--- b/src/test/regress/sql/xml.sql
*************** SELECT xmlelement(name foo, xmlattribute
*** 62,70 ****
--- 62,82 ----
  
  SELECT xmlparse(content 'abc');
  SELECT xmlparse(content '<abc>x</abc>');
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
  
  SELECT xmlparse(document 'abc');
  SELECT xmlparse(document '<abc>x</abc>');
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
  
  
  SELECT xmlpi(name foo);
*************** SELECT xml_is_well_formed('<foo><bar>baz
*** 208,213 ****
--- 220,251 ----
  SELECT xml_is_well_formed('<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
  SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</my:foo>');
  SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</pg:foo>');
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
  
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
+ 
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ -- Since different libxml versions emit slightly different
+ -- error messages, we suppress them
+ \set VERBOSITY terse
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ \set VERBOSITY default
+ 
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ 
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
#10Noah Misch
noah@leadboat.com
In reply to: Florian Pflug (#9)
Re: Another issue with invalid XML values

Hi Florian,

On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:

Updated patch attached.

This builds and passes all tests on both test systems I used previously
(CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
2.6.31.dfsg-2ubuntu1.6). Tested behaviors look good, too.

On Jun20, 2011, at 22:45 , Noah Misch wrote:

On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:

On Jun20, 2011, at 19:57 , Noah Misch wrote:

Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
available and xmlGenericErrorContext otherwise, I would just add an Autoconf
check to identify which one to use.

It seems that before xmlStructuredErrorContext was introduced, the structured
and the generic error handler shared an error context, so doing what you
suggested looks sensible.

I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
this the right way. I basically copied a similar check (for setlongjump AFAIR)
and adapted it to check for xmlStructuredErrorContext instead.

Turned out fine. Note that, more often than not, committers ask for patches
not to contain the diff of the generated "configure" itself. (I personally
don't mind it when the diff portion is small and generated with the correct
version of Autoconf, as in this patch.)

! XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */,

It's fine to set the flag for legacy users, considering they could just not
read it. The important distinction seems to be that we don't emit warnings or
notices in this case. Is that correct? If so, the comment should reflect
that emphasis. Then, consider updating the flag unconditionally.

I took me a while to remember while I did it that way, but I think I have now.

I initialled wanted to add an Assert(!xml_error_occurred) to catch any
missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
though...

So I guess I should either refrain from setting the flag as you suggested,
or add such an Assert(). I think I very slightly prefer the latter, what
do you think?

I do like the idea of that assert. How about setting the flag anyway, but
making it "Assert(xml_purpose == XML_PURPOSE_LEGACY || !xml_error_occurred)"?

I added the Assert, but didn't make the setting of the error flag unconditional.

If I did that, XML_CHECK_AND_EREPORT() would stop working for the LEGACY
case. Now, that case currently isn't exercised, but breaking nevertheless
seemed wrong to me.

Okay.

*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
<wrong
^
SELECT * FROM xmltest;
--- 8,14 ----
ERROR:  invalid XML content
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1

Any reason to change the error text this way?

The "Entity:" prefix is added by libxml only for messages reported
to the generic error handler. It *doesn't* refer to entities as defined
in xml, but instead used in place of the file name if libxml
doesn't have that at hand (because it's parsing from memory).

So that "Entity:" prefix really doesn't have any meaning whatsoever.

So xmlParserPrintFileContext() sends different content to the generic error
handler from what "natural" calls to the handler would send?

xmlParserPrintFileContext() only generates the "context" part of the error
message. In the example above, these are the two lines
<wrong
^
These lines don't contain the "Entity:" prefix - neither with the patch
attached nor without.

Ah, my mistake.

I believe that the compatibility risk is pretty low here, and removing
the meaningless prefix makes the error message are bit nicer to read.
But if you are concerned that there maybe applications out there who
parse the error text, we could of course add it back. I must say that
I don't really know what our policy regarding error message stability is...

If the libxml2 API makes it a major pain to preserve exact messages, I
wouldn't worry.

It'd only require us to prepend "Entity: " to the message string, so
there's no pain there. The question is rather whether we want to continue
having a pure noise word in front of every libxml error message for
the sake of compatibility.

There's also the " parser error :" difference; would that also be easy to
re-add? (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

I vote "Nay", on the grounds that I estimate the actual breakage from
such a change to be approximately zero. Plus the fact that libxml
error messages aren't completely stable either. I had to suppress the
DETAIL output for one of the regression tests to make them work for
both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
URI in it's error message in one of these versions but not the other...

I'm not particularly worried about actual breakage; I'm just putting on the
"one change per patch"-lawyer hat. All other things being equal, I'd prefer
one patch that changes the mechanism for marshalling errors while minimally
changing the format of existing errors, then another patch that proposes new
formatting. (Granted, the formatting-only patch would probably founder.) But
it's a judgement call, and this change is in a grey area. I'm fine with
leaving it up to the committer.

A few minor code comments:

*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c

+ static bool xml_error_initialized = false;

Since xml_error_initialized is only used for asserts and now redundant with
asserts about xml_strictness == PG_XML_STRICTNESS_NONE, consider removing it.

! /*
! * pg_xml_done --- restore libxml state after pg_xml_init().
! *
! * This must be called if pg_xml_init() was called with a
! * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
! * (i.e. the structured error handler) to the original state.

The first sentence is obsolete.

*** a/src/include/utils/xml.h
--- b/src/include/utils/xml.h
*************** typedef enum
*** 68,74 ****
XML_STANDALONE_OMITTED
}	XmlStandaloneType;
! extern void pg_xml_init(void);
extern void xml_ereport(int level, int sqlcode, const char *msg);
extern xmltype *xmlconcat(List *args);
extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
--- 68,87 ----
XML_STANDALONE_OMITTED
}	XmlStandaloneType;

! typedef enum
! {
! PG_XML_STRICTNESS_NONE /* No error handling */,
! PG_XML_STRICTNESS_LEGACY /* Ignore notices/warnings/errors unless
! * function result indicates error condition
! */,
! PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser notices/warnings/errors */,
! PG_XML_STRICTNESS_ALL /* Report all notices/warnings/errors */,
! } PgXmlStrictnessType;

We don't generally prefix backend names with Pg/PG. Also, I think the word
"Type" in "PgXmlStrictnessType" is redundant. How about just XmlStrictness?

Thanks,
nm

#11Florian Pflug
fgp@phlo.org
In reply to: Noah Misch (#10)
Re: Another issue with invalid XML values

On Jun24, 2011, at 13:24 , Noah Misch wrote:

On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:

Updated patch attached.

This builds and passes all tests on both test systems I used previously
(CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
2.6.31.dfsg-2ubuntu1.6). Tested behaviors look good, too.

On Jun20, 2011, at 22:45 , Noah Misch wrote:

On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:

On Jun20, 2011, at 19:57 , Noah Misch wrote:

Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
available and xmlGenericErrorContext otherwise, I would just add an Autoconf
check to identify which one to use.

It seems that before xmlStructuredErrorContext was introduced, the structured
and the generic error handler shared an error context, so doing what you
suggested looks sensible.

I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
this the right way. I basically copied a similar check (for setlongjump AFAIR)
and adapted it to check for xmlStructuredErrorContext instead.

Turned out fine. Note that, more often than not, committers ask for patches
not to contain the diff of the generated "configure" itself. (I personally
don't mind it when the diff portion is small and generated with the correct
version of Autoconf, as in this patch.)

Ah, OK. I didn't know what project policy was there, so I figured I'd include
it since the configure script is tracked by git too. Now that I know, I'll remove
it from the final patch.

I believe that the compatibility risk is pretty low here, and removing
the meaningless prefix makes the error message are bit nicer to read.
But if you are concerned that there maybe applications out there who
parse the error text, we could of course add it back. I must say that
I don't really know what our policy regarding error message stability is...

If the libxml2 API makes it a major pain to preserve exact messages, I
wouldn't worry.

It'd only require us to prepend "Entity: " to the message string, so
there's no pain there. The question is rather whether we want to continue
having a pure noise word in front of every libxml error message for
the sake of compatibility.

There's also the " parser error :" difference; would that also be easy to
re-add? (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

It's the string representation of the error domain and error level. Unfortunately,
the logic which builds that string representation is contained in the static
function xmlReportError() deep within libxml, and that function doesn't seem
to be called at all for errors reported via a structured error handler :-(

So re-adding that would mean duplicating that code within our xml.c, which
seems undesirable...

I vote "Nay", on the grounds that I estimate the actual breakage from
such a change to be approximately zero. Plus the fact that libxml
error messages aren't completely stable either. I had to suppress the
DETAIL output for one of the regression tests to make them work for
both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
URI in it's error message in one of these versions but not the other...

I'm not particularly worried about actual breakage; I'm just putting on the
"one change per patch"-lawyer hat. All other things being equal, I'd prefer
one patch that changes the mechanism for marshalling errors while minimally
changing the format of existing errors, then another patch that proposes new
formatting. (Granted, the formatting-only patch would probably founder.) But
it's a judgement call, and this change is in a grey area. I'm fine with
leaving it up to the committer.

I agree with that principle, in principle. In the light of my paragraph above
about how we'd need to duplicate libxml code to keep the error messages the same,
however, I'll leave it up to the committer as you suggested.

A few minor code comments:

*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c

+ static bool xml_error_initialized = false;

Since xml_error_initialized is only used for asserts and now redundant with
asserts about xml_strictness == PG_XML_STRICTNESS_NONE, consider removing it.

You're absolutely right. Will remove.

! /*
! * pg_xml_done --- restore libxml state after pg_xml_init().
! *
! * This must be called if pg_xml_init() was called with a
! * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
! * (i.e. the structured error handler) to the original state.

The first sentence is obsolete.

Right again. Will remove.

*** a/src/include/utils/xml.h
--- b/src/include/utils/xml.h
*************** typedef enum
*** 68,74 ****
XML_STANDALONE_OMITTED
}	XmlStandaloneType;
! extern void pg_xml_init(void);
extern void xml_ereport(int level, int sqlcode, const char *msg);
extern xmltype *xmlconcat(List *args);
extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
--- 68,87 ----
XML_STANDALONE_OMITTED
}	XmlStandaloneType;

! typedef enum
! {
! PG_XML_STRICTNESS_NONE /* No error handling */,
! PG_XML_STRICTNESS_LEGACY /* Ignore notices/warnings/errors unless
! * function result indicates error condition
! */,
! PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser notices/warnings/errors */,
! PG_XML_STRICTNESS_ALL /* Report all notices/warnings/errors */,
! } PgXmlStrictnessType;

We don't generally prefix backend names with Pg/PG. Also, I think the word
"Type" in "PgXmlStrictnessType" is redundant. How about just XmlStrictness?

I agree with removing the "Type" suffix. I'm not so sure about the "Pg"
prefix, though. I've added that after actually hitting a conflict between
one of this enum's constants and some constant defined by libxml. Admittedly,
that was *before* I renamed the type to "PgXmlStrictnessType", so removing
the "Pg" prefix now would probably work. But it seems a bit close for
comfort - libxml defines a ton of constants named XML_something.

Still, if you feel that the "Pg" prefix looks to weird and believe that it's
better to wait until there's an actual clash before renaming things, I won't
object further. Just wanted to bring the issue to your attention...

I'll post an updated patch once that last issue is resolved.

best regards,
Florian Pflug

#12Noah Misch
noah@leadboat.com
In reply to: Florian Pflug (#11)
Re: Another issue with invalid XML values

On Fri, Jun 24, 2011 at 04:15:35PM +0200, Florian Pflug wrote:

On Jun24, 2011, at 13:24 , Noah Misch wrote:

On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:

There's also the " parser error :" difference; would that also be easy to
re-add? (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

It's the string representation of the error domain and error level. Unfortunately,
the logic which builds that string representation is contained in the static
function xmlReportError() deep within libxml, and that function doesn't seem
to be called at all for errors reported via a structured error handler :-(

So re-adding that would mean duplicating that code within our xml.c, which
seems undesirable...

Yes, that seems like sufficient trouble to not undertake merely for the sake of
a cosmetic error message match.

! typedef enum
! {
! PG_XML_STRICTNESS_NONE /* No error handling */,
! PG_XML_STRICTNESS_LEGACY /* Ignore notices/warnings/errors unless
! * function result indicates error condition
! */,
! PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser notices/warnings/errors */,
! PG_XML_STRICTNESS_ALL /* Report all notices/warnings/errors */,
! } PgXmlStrictnessType;

We don't generally prefix backend names with Pg/PG. Also, I think the word
"Type" in "PgXmlStrictnessType" is redundant. How about just XmlStrictness?

I agree with removing the "Type" suffix. I'm not so sure about the "Pg"
prefix, though. I've added that after actually hitting a conflict between
one of this enum's constants and some constant defined by libxml. Admittedly,
that was *before* I renamed the type to "PgXmlStrictnessType", so removing
the "Pg" prefix now would probably work. But it seems a bit close for
comfort - libxml defines a ton of constants named XML_something.

Still, if you feel that the "Pg" prefix looks to weird and believe that it's
better to wait until there's an actual clash before renaming things, I won't
object further. Just wanted to bring the issue to your attention...

I do think it looks weird, but I agree that the potential for conflict is
notably higher than normal. I'm fine with having the prefix.

Thanks,
nm

#13Florian Pflug
fgp@phlo.org
In reply to: Noah Misch (#12)
1 attachment(s)
Re: Another issue with invalid XML values

Hi

Updated patch attached. Do you think this is "Ready for Committer"?

best regards,
Florian Pflug

Attachments:

pg_xml_errorhandling.v3.patchapplication/octet-stream; name=pg_xml_errorhandling.v3.patch; x-unix-mode=0644Download
diff --git a/configure.in b/configure.in
index e873c7b..6f9f0c4 100644
*** a/configure.in
--- b/configure.in
*************** if test "$with_libxml" = yes ; then
*** 741,746 ****
--- 741,759 ----
        esac
      done
    fi
+ 
+   # Cannot use AC_CHECK_FUNC because xmlStructuredErrorContext may be a
+   # macro if threading support is enabled in libxml2
+   AC_CACHE_CHECK([for xmlStructuredErrorContext], pgac_cv_libxml_structerrctx,
+   [AC_TRY_LINK([#include <libxml/globals.h>],
+               [void* d = xmlStructuredErrorContext],
+               [pgac_cv_libxml_structerrctx=yes],
+               [pgac_cv_libxml_structerrctx=no])])
+   if test x"$pgac_cv_libxml_structerrctx" = x"yes"; then
+     AC_DEFINE(HAVE_LIBXML2_XMLSTRUCTUREDERRORCONTEXT,
+              1,
+              [Define to 1 if your libxml has xmlStructuredErrorContext.])
+   fi
  fi
  
  AC_SUBST(with_libxml)
diff --git a/contrib/xml2/xpath.c b/contrib/xml2/xpath.c
index 44c600e..8170267 100644
*** a/contrib/xml2/xpath.c
--- b/contrib/xml2/xpath.c
*************** Datum		xpath_table(PG_FUNCTION_ARGS);
*** 38,44 ****
  
  /* exported for use by xslt_proc.c */
  
! void		pgxml_parser_init(void);
  
  /* workspace for pgxml_xpath() */
  
--- 38,44 ----
  
  /* exported for use by xslt_proc.c */
  
! void		pgxml_parser_init(PgXmlStrictness strictness);
  
  /* workspace for pgxml_xpath() */
  
*************** static void cleanup_workspace(xpath_work
*** 70,79 ****
   * Initialize for xml parsing.
   */
  void
! pgxml_parser_init(void)
  {
  	/* Set up error handling (we share the core's error handler) */
! 	pg_xml_init();
  
  	/* Initialize libxml */
  	xmlInitParser();
--- 70,79 ----
   * Initialize for xml parsing.
   */
  void
! pgxml_parser_init(PgXmlStrictness strictness)
  {
  	/* Set up error handling (we share the core's error handler) */
! 	pg_xml_init(strictness);
  
  	/* Initialize libxml */
  	xmlInitParser();
*************** xml_is_well_formed(PG_FUNCTION_ARGS)
*** 100,113 ****
  	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
  	int32		docsize = VARSIZE(t) - VARHDRSZ;
  	xmlDocPtr	doctree;
  
! 	pgxml_parser_init();
  
  	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
! 	if (doctree == NULL)
! 		PG_RETURN_BOOL(false);	/* i.e. not well-formed */
  	xmlFreeDoc(doctree);
! 	PG_RETURN_BOOL(true);
  }
  
  
--- 100,116 ----
  	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
  	int32		docsize = VARSIZE(t) - VARHDRSZ;
  	xmlDocPtr	doctree;
+ 	bool		result;
  
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
! 	result = (doctree != NULL);
  	xmlFreeDoc(doctree);
! 
! 	pg_xml_done();
! 
! 	PG_RETURN_BOOL(result);
  }
  
  
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 406,416 ****
  	workspace->ctxt = NULL;
  	workspace->res = NULL;
  
! 	pgxml_parser_init();
  
  	workspace->doctree = xmlParseMemory((char *) VARDATA(document), docsize);
! 	if (workspace->doctree == NULL)
  		return NULL;			/* not well-formed */
  
  	workspace->ctxt = xmlXPathNewContext(workspace->doctree);
  	workspace->ctxt->node = xmlDocGetRootElement(workspace->doctree);
--- 409,421 ----
  	workspace->ctxt = NULL;
  	workspace->res = NULL;
  
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	workspace->doctree = xmlParseMemory((char *) VARDATA(document), docsize);
! 	if (workspace->doctree == NULL) {
! 		pg_xml_done();
  		return NULL;			/* not well-formed */
+ 	}
  
  	workspace->ctxt = xmlXPathNewContext(workspace->doctree);
  	workspace->ctxt->node = xmlDocGetRootElement(workspace->doctree);
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 420,425 ****
--- 425,431 ----
  	if (comppath == NULL)
  	{
  		cleanup_workspace(workspace);
+ 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"XPath Syntax Error");
  	}
*************** pgxml_xpath(text *document, xmlChar *xpa
*** 433,438 ****
--- 439,446 ----
  	if (res == NULL)
  		cleanup_workspace(workspace);
  
+ 	pg_xml_done();
+ 
  	return res;
  }
  
*************** xpath_table(PG_FUNCTION_ARGS)
*** 659,665 ****
  	 * Setup the parser.  This should happen after we are done evaluating the
  	 * query, in case it calls functions that set up libxml differently.
  	 */
! 	pgxml_parser_init();
  
  	/* For each row i.e. document returned from SPI */
  	for (i = 0; i < proc; i++)
--- 667,673 ----
  	 * Setup the parser.  This should happen after we are done evaluating the
  	 * query, in case it calls functions that set up libxml differently.
  	 */
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	/* For each row i.e. document returned from SPI */
  	for (i = 0; i < proc; i++)
*************** xpath_table(PG_FUNCTION_ARGS)
*** 720,725 ****
--- 728,734 ----
  					if (comppath == NULL)
  					{
  						xmlFreeDoc(doctree);
+ 						pg_xml_done();
  						xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  									"XPath Syntax Error");
  					}
*************** xpath_table(PG_FUNCTION_ARGS)
*** 784,789 ****
--- 793,800 ----
  			pfree(xmldoc);
  	}
  
+ 	pg_xml_done();
+ 
  	tuplestore_donestoring(tupstore);
  
  	SPI_finish();
diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c
index f8f7d72..6008bc4 100644
*** a/contrib/xml2/xslt_proc.c
--- b/contrib/xml2/xslt_proc.c
*************** Datum		xslt_process(PG_FUNCTION_ARGS);
*** 38,44 ****
  #ifdef USE_LIBXSLT
  
  /* declarations to come from xpath.c */
! extern void pgxml_parser_init(void);
  
  /* local defs */
  static const char **parse_params(text *paramstr);
--- 38,44 ----
  #ifdef USE_LIBXSLT
  
  /* declarations to come from xpath.c */
! extern void pgxml_parser_init(PgXmlStrictness strictness);
  
  /* local defs */
  static const char **parse_params(text *paramstr);
*************** xslt_process(PG_FUNCTION_ARGS)
*** 77,83 ****
  	}
  
  	/* Setup parser */
! 	pgxml_parser_init();
  
  	/* Check to see if document is a file or a literal */
  
--- 77,83 ----
  	}
  
  	/* Setup parser */
! 	pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
  
  	/* Check to see if document is a file or a literal */
  
*************** xslt_process(PG_FUNCTION_ARGS)
*** 86,94 ****
  	else
  		doctree = xmlParseFile(text_to_cstring(doct));
  
! 	if (doctree == NULL)
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"error parsing XML document");
  
  	/* Same for stylesheet */
  	if (VARDATA(ssheet)[0] == '<')
--- 86,96 ----
  	else
  		doctree = xmlParseFile(text_to_cstring(doct));
  
! 	if (doctree == NULL) {
! 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"error parsing XML document");
+ 	}
  
  	/* Same for stylesheet */
  	if (VARDATA(ssheet)[0] == '<')
*************** xslt_process(PG_FUNCTION_ARGS)
*** 98,103 ****
--- 100,106 ----
  		if (ssdoc == NULL)
  		{
  			xmlFreeDoc(doctree);
+ 			pg_xml_done();
  			xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  						"error parsing stylesheet as XML document");
  		}
*************** xslt_process(PG_FUNCTION_ARGS)
*** 112,117 ****
--- 115,121 ----
  	{
  		xmlFreeDoc(doctree);
  		xsltCleanupGlobals();
+ 		pg_xml_done();
  		xml_ereport(ERROR, ERRCODE_EXTERNAL_ROUTINE_EXCEPTION,
  					"failed to parse stylesheet");
  	}
*************** xslt_process(PG_FUNCTION_ARGS)
*** 125,130 ****
--- 129,136 ----
  
  	xsltCleanupGlobals();
  
+ 	pg_xml_done();
+ 
  	if (resstat < 0)
  		PG_RETURN_NULL();
  
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 702b9e3..c5e3dac 100644
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
*************** int			xmloption;
*** 84,92 ****
  
  #ifdef USE_LIBXML
  
  static StringInfo xml_err_buf = NULL;
  
! static void xml_errorHandler(void *ctxt, const char *msg,...);
  static void xml_ereport_by_code(int level, int sqlcode,
  					const char *msg, int errcode);
  
--- 84,106 ----
  
  #ifdef USE_LIBXML
  
+ #define XML_CHECK_AND_EREPORT(assertion,level,sqlcode,msg) \
+ 	if (pg_xml_erroroccurred() || !(assertion)) { \
+ 		xml_ereport(level, sqlcode, msg); \
+ 	} \
+ 	else { \
+ 	}
+ 
+ static xmlStructuredErrorFunc xml_structuredErrorFunc_saved = NULL;
+ static void* xml_structuredErrorContext_saved = NULL;
+ 
+ static PgXmlStrictness xml_strictness = PG_XML_STRICTNESS_NONE;
+ static bool xml_err_occurred;
  static StringInfo xml_err_buf = NULL;
  
! static void xml_errorHandler(void* data, xmlErrorPtr error);
! static void appendStringInfoLineSeparator(StringInfo str);
! static void chopStringInfoNewlines(StringInfo str);
  static void xml_ereport_by_code(int level, int sqlcode,
  					const char *msg, int errcode);
  
*************** xmlelement(XmlExprState *xmlExpr, ExprCo
*** 597,614 ****
  	}
  
  	/* now safe to run libxml */
! 	pg_xml_init();
  
  	PG_TRY();
  	{
  		buf = xmlBufferCreate();
! 		if (!buf)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate xmlBuffer");
  		writer = xmlNewTextWriterMemory(buf, 0);
! 		if (!writer)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate xmlTextWriter");
  
  		xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
  
--- 611,626 ----
  	}
  
  	/* now safe to run libxml */
! 	pg_xml_init(PG_XML_STRICTNESS_ALL);
  
  	PG_TRY();
  	{
  		buf = xmlBufferCreate();
! 		XML_CHECK_AND_EREPORT(buf != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							 "could not allocate xmlBuffer");
  		writer = xmlNewTextWriterMemory(buf, 0);
! 		XML_CHECK_AND_EREPORT(writer != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							  "could not allocate xmlTextWriter");
  
  		xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
  
*************** xmlelement(XmlExprState *xmlExpr, ExprCo
*** 644,655 ****
--- 656,672 ----
  			xmlFreeTextWriter(writer);
  		if (buf)
  			xmlBufferFree(buf);
+ 
+ 		pg_xml_done();
+ 
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlBufferFree(buf);
  
+ 	pg_xml_done();
+ 
  	return result;
  #else
  	NO_XML_SUPPORT();
*************** xml_is_document(xmltype *arg)
*** 843,865 ****
  #ifdef USE_LIBXML
  
  /*
!  * pg_xml_init --- set up for use of libxml
   *
   * This should be called by each function that is about to use libxml
!  * facilities.	It has two responsibilities: verify compatibility with the
!  * loaded libxml version (done on first call in a session) and establish
!  * or re-establish our libxml error handler.  The latter needs to be done
!  * anytime we might have passed control to add-on modules (eg libperl) which
!  * might have set their own error handler for libxml.
!  *
!  * This is exported for use by contrib/xml2, as well as other code that might
!  * wish to share use of this module's libxml error handler.
!  *
!  * TODO: xmlChar is utf8-char, make proper tuning (initdb with enc!=utf8 and
!  * check)
   */
  void
! pg_xml_init(void)
  {
  	static bool first_time = true;
  
--- 860,874 ----
  #ifdef USE_LIBXML
  
  /*
!  * pg_xml_init_library --- setup for use of libxml
   *
   * This should be called by each function that is about to use libxml
!  * facilities but doesn't require error handling. It initializes libxml
!  * and verifies compatibility with the loaded libxml version (done on first
!  * call in a session).
   */
  void
! pg_xml_init_library(void)
  {
  	static bool first_time = true;
  
*************** pg_xml_init(void)
*** 883,891 ****
  		xml_err_buf = makeStringInfo();
  		MemoryContextSwitchTo(oldcontext);
  
- 		/* Now that xml_err_buf exists, safe to call xml_errorHandler */
- 		xmlSetGenericErrorFunc(NULL, xml_errorHandler);
- 
  #ifdef USE_LIBXMLCONTEXT
  		/* Set up memory allocation our way, too */
  		xml_memory_init();
--- 892,897 ----
*************** pg_xml_init(void)
*** 896,916 ****
  
  		first_time = false;
  	}
! 	else
! 	{
! 		/* Reset pre-existing buffer to empty */
! 		Assert(xml_err_buf != NULL);
! 		resetStringInfo(xml_err_buf);
  
! 		/*
! 		 * We re-establish the error callback function every time.	This makes
! 		 * it safe for other subsystems (PL/Perl, say) to also use libxml with
! 		 * their own callbacks ... so long as they likewise set up the
! 		 * callbacks on every use. It's cheap enough to not be worth worrying
! 		 * about, anyway.
! 		 */
! 		xmlSetGenericErrorFunc(NULL, xml_errorHandler);
  	}
  }
  
  
--- 902,1003 ----
  
  		first_time = false;
  	}
! }
  
! /*
!  * pg_xml_init --- set up for use of libxml and register an error handler
!  *
!  * This should be called by each function that is about to use libxml
!  * facilities and requires error handling. It initialized libxml with
!  * pg_xml_init_library() and establishes our libxml error handler.
!  * The strictness determines which errors are reported and which are ignored
!  *
!  * pg_xml_done() *must* be called after the caller is done using libxml
!  * to restore the original error handler.
!  *
!  * This is exported for use by contrib/xml2, as well as other code that might
!  * wish to share use of this module's libxml error handler.
!  *
!  * TODO: xmlChar is utf8-char, make proper tuning (initdb with enc!=utf8 and
!  * check)
!  */
! void
! pg_xml_init(PgXmlStrictness strictness)
! {
! 	pg_xml_init_library();
! 
! 	Assert(strictness != PG_XML_STRICTNESS_NONE);
! 
! 	/* Imbalanced calls of pg_xml_init() and pg_xml_done() */
! 	Assert(xml_strictness == PG_XML_STRICTNESS_NONE);
! 
! 	/* Set strictness level */
! 	xml_strictness = strictness;
! 
! 	/* Reset pre-existing buffer to empty */
! 	Assert(xml_err_buf != NULL);
! 	resetStringInfo(xml_err_buf);
! 	xml_err_occurred = false;
! 
! 	/* Save original error handler and install ours. libxml originally didn't
! 	 * distinguish between the context for generic and for structured error
! 	 * handler. If we're using such an libxml version, we must thus save
! 	 * the generic error context, even though we're using a structured
! 	 * error handler.
! 	 */
! 	xml_structuredErrorFunc_saved = xmlStructuredError;
! #if HAVE_LIBXML2_XMLSTRUCTUREDERRORCONTEXT
! 	xml_structuredErrorContext_saved = xmlStructuredErrorContext;
! #else
! 	xml_structuredErrorContext_saved = xmlGenericErrorContext;
! #endif
! 	xmlSetStructuredErrorFunc(NULL, xml_errorHandler);
! }
! 
! 
! /*
!  * pg_xml_done --- restore libxml state after pg_xml_init().
!  *
!  * Resets libxml's global state (i.e. the structured error handler)
!  * to what it was originally before pg_xml_init() was called.
!  *
!  * This routine verifies that all pending errors have been dealt
!  * with (in assert-enabled builds, anyway).
!  *
!  * It is OK to call xml_ereport() after pg_xml_done(), though -
!  * pg_xml_done() leaves xml_err_buf as it is.
!  */
! void
! pg_xml_done(void)
! {
! 	/* Imbalanced calls of pg_xml_init() and pg_xml_done(). */
! 	Assert(xml_strictness != PG_XML_STRICTNESS_NONE);
! 
! 	/* Errors should have been dealt with */
! 	Assert(!xml_err_occurred);
! 
! 	xmlSetStructuredErrorFunc(xml_structuredErrorContext_saved,
! 							  xml_structuredErrorFunc_saved);
! 	xml_strictness = PG_XML_STRICTNESS_NONE;
! }
! 
! 
! 
! /*
!  * pg_xml_erroroccurred() --- Test and reset the error flag.
!  *
!  * Returns true if an libxml error occurred after the last call of
!  * this function.
!  */
! bool
! pg_xml_erroroccurred(void)
! {
! 	if (xml_err_occurred) {
! 		xml_err_occurred = false;
! 		return true;
  	}
+ 
+ 	return false;
  }
  
  
*************** parse_xml_decl(const xmlChar *str, size_
*** 969,975 ****
  	int			utf8char;
  	int			utf8len;
  
! 	pg_xml_init();
  
  	/* Initialize output arguments to "not present" */
  	if (version)
--- 1056,1066 ----
  	int			utf8char;
  	int			utf8len;
  
! 	/* Only initialize libxml. We don't need error handling here,
! 	 * but we do need to make sure libxml is initialized before
! 	 * calling any of its functions
! 	 */
! 	pg_xml_init_library();
  
  	/* Initialize output arguments to "not present" */
  	if (version)
*************** static bool
*** 1123,1130 ****
  print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone)
  {
- 	pg_xml_init();				/* why is this here? */
- 
  	if ((version && strcmp((char *) version, PG_XML_DEFAULT_VERSION) != 0)
  		|| (encoding && encoding != PG_UTF8)
  		|| standalone != -1)
--- 1214,1219 ----
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1175,1182 ****
  	int32		len;
  	xmlChar    *string;
  	xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt;
! 	xmlDocPtr	doc;
  
  	len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
  	string = xml_text2xmlChar(data);
--- 1264,1271 ----
  	int32		len;
  	xmlChar    *string;
  	xmlChar    *utf8string;
! 	xmlParserCtxtPtr ctxt = NULL;
! 	xmlDocPtr	doc = NULL;
  
  	len = VARSIZE(data) - VARHDRSZ;		/* will be useful later */
  	string = xml_text2xmlChar(data);
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1187,1203 ****
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
! 	pg_xml_init();
  	xmlInitParser();
  
- 	ctxt = xmlNewParserCtxt();
- 	if (ctxt == NULL)
- 		xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
- 					"could not allocate parser context");
- 
  	/* Use a TRY block to ensure the ctxt is released */
  	PG_TRY();
  	{
  		if (xmloption_arg == XMLOPTION_DOCUMENT)
  		{
  			/*
--- 1276,1291 ----
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
! 	pg_xml_init(PG_XML_STRICTNESS_WELLFORMED);
  	xmlInitParser();
  
  	/* Use a TRY block to ensure the ctxt is released */
  	PG_TRY();
  	{
+ 		ctxt = xmlNewParserCtxt();
+ 		XML_CHECK_AND_EREPORT(ctxt != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 							  "could not allocate parser context");
+ 
  		if (xmloption_arg == XMLOPTION_DOCUMENT)
  		{
  			/*
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1212,1220 ****
  								 "UTF-8",
  								 XML_PARSE_NOENT | XML_PARSE_DTDATTR
  						   | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
! 			if (doc == NULL)
! 				xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 							"invalid XML document");
  		}
  		else
  		{
--- 1300,1307 ----
  								 "UTF-8",
  								 XML_PARSE_NOENT | XML_PARSE_DTDATTR
  						   | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS));
! 			XML_CHECK_AND_EREPORT(doc != NULL, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 								  "invalid XML document");
  		}
  		else
  		{
*************** xml_parse(text *data, XmlOptionType xmlo
*** 1237,1259 ****
  
  			res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
  												   utf8string + count, NULL);
! 			if (res_code != 0)
! 			{
! 				xmlFreeDoc(doc);
! 				xml_ereport(ERROR, ERRCODE_INVALID_XML_CONTENT,
! 							"invalid XML content");
! 			}
  		}
  	}
  	PG_CATCH();
  	{
! 		xmlFreeParserCtxt(ctxt);
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlFreeParserCtxt(ctxt);
  
  	return doc;
  }
  
--- 1324,1350 ----
  
  			res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
  												   utf8string + count, NULL);
! 			XML_CHECK_AND_EREPORT(res_code == 0, ERROR, ERRCODE_INVALID_XML_CONTENT,
! 								  "invalid XML content");
  		}
  	}
  	PG_CATCH();
  	{
! 		if (doc != NULL)
! 			xmlFreeDoc(doc);
! 		if (ctxt != NULL)
! 			xmlFreeParserCtxt(ctxt);
! 
! 		pg_xml_done();
! 
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
  
  	xmlFreeParserCtxt(ctxt);
  
+ 	pg_xml_done();
+ 
  	return doc;
  }
  
*************** xml_ereport(int level, int sqlcode, cons
*** 1354,1366 ****
  
  	if (detail)
  	{
- 		size_t		len;
- 
- 		/* libxml error messages end in '\n'; get rid of it */
- 		len = strlen(detail);
- 		if (len > 0 && detail[len - 1] == '\n')
- 			detail[len - 1] = '\0';
- 
  		ereport(level,
  				(errcode(sqlcode),
  				 errmsg("%s", msg),
--- 1445,1450 ----
*************** xml_ereport(int level, int sqlcode, cons
*** 1376,1402 ****
  
  
  /*
!  * Error handler for libxml error messages
   */
  static void
! xml_errorHandler(void *ctxt, const char *msg,...)
  {
! 	/* Append the formatted text to xml_err_buf */
! 	for (;;)
  	{
! 		va_list		args;
! 		bool		success;
  
- 		/* Try to format the data. */
- 		va_start(args, msg);
- 		success = appendStringInfoVA(xml_err_buf, msg, args);
- 		va_end(args);
  
! 		if (success)
  			break;
  
! 		/* Double the buffer size and try again. */
! 		enlargeStringInfo(xml_err_buf, xml_err_buf->maxlen);
  	}
  }
  
--- 1460,1618 ----
  
  
  /*
!  * Append a newline after removing all existing trailing newlines
   */
  static void
! appendStringInfoLineSeparator(StringInfo str)
  {
! 	chopStringInfoNewlines(str);
! 
! 	if (str->len > 0)
! 		appendStringInfoChar(str, '\n');
! }
! 
! 
! /*
!  * Remove all trailing newlines
!  */
! static void
! chopStringInfoNewlines(StringInfo str)
! {
! 	while ((str->len > 0) &&
! 		   (str->data[str->len-1] == '\n'))
  	{
! 		str->data[--str->len] = '\0';
! 	}
! }
  
  
! /*
!  * Error handler for libxml errors and warnings
!  */
! static void
! xml_errorHandler(void* data, xmlErrorPtr error)
! {
! 	int			domain;
! 	int			level;
! 	StringInfo errorBuf = makeStringInfo();
! 
! 	/* Get parser and input context */
! 	xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr) error->ctxt;
! 	xmlParserInputPtr input = (ctxt != NULL) ? ctxt->input : NULL;
! 	xmlNodePtr node = error->node;
! 	const xmlChar* name = ((node != NULL) && (node->type == XML_ELEMENT_NODE)) ?
! 					node->name : NULL;
! 
! 	/* Older libxml versions report some errors differently.
! 	 * First, some errors were previously reported as
! 	 * coming from the parser domain but are now reported as
! 	 * coming from the namespace domain.
! 	 * Second, some warnings were upgraded to errors.
! 	 * We attempt to compensate for that here
! 	 */
! 	domain = error->domain;
! 	level = error->level;
! 	switch (error->code) {
! 		case XML_WAR_NS_URI:
! 			level = XML_ERR_ERROR;
! 			domain = XML_FROM_NAMESPACE;
  			break;
  
! 		case XML_ERR_NS_DECL_ERROR:
! 		case XML_WAR_NS_URI_RELATIVE:
! 		case XML_WAR_NS_COLUMN:
! 		case XML_NS_ERR_XML_NAMESPACE:
! 		case XML_NS_ERR_UNDEFINED_NAMESPACE:
! 		case XML_NS_ERR_QNAME:
! 		case XML_NS_ERR_ATTRIBUTE_REDEFINED:
! 		case XML_NS_ERR_EMPTY:
! 			domain = XML_FROM_NAMESPACE;
! 			break;
! 	}
! 
! 	/* Decide whether to act on the error or not */
! 	switch (domain) {
! 		case XML_FROM_PARSER:
! 		case XML_FROM_NONE:
! 		case XML_FROM_MEMORY:
! 		case XML_FROM_IO:
! 			/* Accept regardless of the parsing purpose */
! 			break;
! 
! 		default:
! 			/* Ignore during well-formedness check */
! 			if (xml_strictness == PG_XML_STRICTNESS_WELLFORMED)
! 				return;
! 			break;
! 	}
! 
! 	/* Append error message to xml_err_buf */
! 	if (error->line > 0)
! 		appendStringInfo(errorBuf, "line %d: ", error->line);
! 	if (name != NULL)
! 		appendStringInfo(errorBuf, "element %s: ", name);
! 	appendStringInfo(errorBuf, "%s", error->message);
! 
! 	/* Append context information to xml_err_buf.
! 	 * xmlParserPrintFileContext() uses the *generic* error handle to
! 	 * write the context. Since we don't want to duplicate libxml
! 	 * functionality here, we setup a generic error handler temporarily
! 	 */
! 	if (input != NULL)
! 	{
! 		/* Save generic error func and setup the xml_err_buf
! 		 * appender as generic error func. This should work because
! 		 * appendStringInfo() has essentially the same signature as
! 		 * xmlGenericErrorFunc().
! 		 */
! 		xmlGenericErrorFunc errFuncSaved = xmlGenericError;
! 		void* errCtxSaved = xmlGenericErrorContext;
! 		xmlSetGenericErrorFunc(errorBuf, (xmlGenericErrorFunc)appendStringInfo);
! 
! 		/* Generic context information */
! 		appendStringInfoLineSeparator(errorBuf);
! 		xmlParserPrintFileContext(input);
! 
! 		/* Restore generic error func */
! 		xmlSetGenericErrorFunc(errCtxSaved, errFuncSaved);
! 	}
! 
! 	chopStringInfoNewlines(errorBuf);
! 
! 	/* Legacy error handling. The error flag is never set. Exists because
! 	 * the xml2 contrib module uses our error-handling infrastructure, but
! 	 * we don't want to change its behaviour since it's deprecated anyway.
! 	 * This is also why we don't distinguish between notices, warnings
! 	 * and errors here - the old-style generic error handler wouldn't
! 	 * have done that either.
! 	 */
! 	if (xml_strictness == PG_XML_STRICTNESS_LEGACY)
! 	{
! 		appendStringInfoLineSeparator(xml_err_buf);
! 		appendStringInfoString(xml_err_buf, errorBuf->data);
! 		return;
! 	}
! 
! 	/* We don't want to ereport() here because that'd probably leave
! 	 * libxml in an inconsistent state. Instead, we remember the
! 	 * error and ereport() from xml_ereport().
! 	 *
! 	 * Warnings and notices are reported immediatly since they don't cause a
! 	 * longjmp() out of libxml.
! 	 */
! 	if (level >= XML_ERR_ERROR)
! 	{
! 		appendStringInfoLineSeparator(xml_err_buf);
! 		appendStringInfoString(xml_err_buf, errorBuf->data);
! 		xml_err_occurred = true;
! 	}
! 	else if (level >= XML_ERR_WARNING)
! 	{
! 		ereport(WARNING, (errmsg("%s", errorBuf->data)));
! 	}
! 	else
! 	{
! 		ereport(NOTICE, (errmsg("%s", errorBuf->data)));
  	}
  }
  
*************** map_sql_value_to_xml_value(Datum value, 
*** 1756,1773 ****
  					xmlTextWriterPtr writer = NULL;
  					char	   *result;
  
! 					pg_xml_init();
  
  					PG_TRY();
  					{
  						buf = xmlBufferCreate();
! 						if (!buf)
! 							xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 										"could not allocate xmlBuffer");
  						writer = xmlNewTextWriterMemory(buf, 0);
! 						if (!writer)
! 							xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 										"could not allocate xmlTextWriter");
  
  						if (xmlbinary == XMLBINARY_BASE64)
  							xmlTextWriterWriteBase64(writer, VARDATA_ANY(bstr),
--- 1972,1987 ----
  					xmlTextWriterPtr writer = NULL;
  					char	   *result;
  
! 					pg_xml_init(PG_XML_STRICTNESS_ALL);
  
  					PG_TRY();
  					{
  						buf = xmlBufferCreate();
! 						XML_CHECK_AND_EREPORT(buf != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 											  "could not allocate xmlBuffer");
  						writer = xmlNewTextWriterMemory(buf, 0);
! 						XML_CHECK_AND_EREPORT(writer != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 											  "could not allocate xmlTextWriter");
  
  						if (xmlbinary == XMLBINARY_BASE64)
  							xmlTextWriterWriteBase64(writer, VARDATA_ANY(bstr),
*************** map_sql_value_to_xml_value(Datum value, 
*** 1788,1799 ****
--- 2002,2018 ----
  							xmlFreeTextWriter(writer);
  						if (buf)
  							xmlBufferFree(buf);
+ 
+ 						pg_xml_done();
+ 
  						PG_RE_THROW();
  					}
  					PG_END_TRY();
  
  					xmlBufferFree(buf);
  
+ 					pg_xml_done();
+ 
  					return result;
  				}
  #endif   /* USE_LIBXML */
*************** xpath_internal(text *xpath_expr_text, xm
*** 3381,3387 ****
  	memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
  	xpath_expr[xpath_len] = '\0';
  
! 	pg_xml_init();
  	xmlInitParser();
  
  	PG_TRY();
--- 3600,3606 ----
  	memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
  	xpath_expr[xpath_len] = '\0';
  
! 	pg_xml_init(PG_XML_STRICTNESS_ALL);
  	xmlInitParser();
  
  	PG_TRY();
*************** xpath_internal(text *xpath_expr_text, xm
*** 3391,3411 ****
  		 * command execution are possible)
  		 */
  		ctxt = xmlNewParserCtxt();
! 		if (ctxt == NULL)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate parser context");
  		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 		if (doc == NULL)
! 			xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 						"could not parse XML document");
  		xpathctx = xmlXPathNewContext(doc);
! 		if (xpathctx == NULL)
! 			xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
! 						"could not allocate XPath context");
  		xpathctx->node = xmlDocGetRootElement(doc);
! 		if (xpathctx->node == NULL)
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"could not find root XML element");
  
  		/* register namespaces, if any */
  		if (ns_count > 0)
--- 3610,3626 ----
  		 * command execution are possible)
  		 */
  		ctxt = xmlNewParserCtxt();
! 		XML_CHECK_AND_EREPORT(ctxt != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							  "could not allocate parser context");
  		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 		XML_CHECK_AND_EREPORT(doc != NULL, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 							  "could not parse XML document");
  		xpathctx = xmlXPathNewContext(doc);
! 		XML_CHECK_AND_EREPORT(xpathctx != NULL, ERROR, ERRCODE_OUT_OF_MEMORY,
! 							  "could not allocate XPath context");
  		xpathctx->node = xmlDocGetRootElement(doc);
! 		XML_CHECK_AND_EREPORT(xpathctx->node != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 							  "could not find root XML element");
  
  		/* register namespaces, if any */
  		if (ns_count > 0)
*************** xpath_internal(text *xpath_expr_text, xm
*** 3432,3440 ****
  		}
  
  		xpathcomp = xmlXPathCompile(xpath_expr);
! 		if (xpathcomp == NULL)	/* TODO: show proper XPath error details */
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"invalid XPath expression");
  
  		/*
  		 * Version 2.6.27 introduces a function named
--- 3647,3654 ----
  		}
  
  		xpathcomp = xmlXPathCompile(xpath_expr);
! 		XML_CHECK_AND_EREPORT(xpathcomp != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 							  "invalid XPath expression");
  
  		/*
  		 * Version 2.6.27 introduces a function named
*************** xpath_internal(text *xpath_expr_text, xm
*** 3444,3452 ****
  		 * the same.
  		 */
  		xpathobj = xmlXPathCompiledEval(xpathcomp, xpathctx);
! 		if (xpathobj == NULL)	/* TODO: reason? */
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 						"could not create XPath object");
  
  		/* return empty array in cases when nothing is found */
  		if (xpathobj->nodesetval == NULL)
--- 3658,3665 ----
  		 * the same.
  		 */
  		xpathobj = xmlXPathCompiledEval(xpathcomp, xpathctx);
! 		XML_CHECK_AND_EREPORT(xpathobj != NULL, ERROR, ERRCODE_INTERNAL_ERROR,
! 							  "could not create XPath object");
  
  		/* return empty array in cases when nothing is found */
  		if (xpathobj->nodesetval == NULL)
*************** xpath_internal(text *xpath_expr_text, xm
*** 3481,3486 ****
--- 3694,3702 ----
  			xmlFreeDoc(doc);
  		if (ctxt)
  			xmlFreeParserCtxt(ctxt);
+ 
+ 		pg_xml_done();
+ 
  		PG_RE_THROW();
  	}
  	PG_END_TRY();
*************** xpath_internal(text *xpath_expr_text, xm
*** 3490,3495 ****
--- 3706,3713 ----
  	xmlXPathFreeContext(xpathctx);
  	xmlFreeDoc(doc);
  	xmlFreeParserCtxt(ctxt);
+ 
+ 	pg_xml_done();
  }
  #endif   /* USE_LIBXML */
  
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 6359cd6..1baaabe 100644
*** a/src/include/utils/xml.h
--- b/src/include/utils/xml.h
*************** typedef enum
*** 68,74 ****
  	XML_STANDALONE_OMITTED
  }	XmlStandaloneType;
  
! extern void pg_xml_init(void);
  extern void xml_ereport(int level, int sqlcode, const char *msg);
  extern xmltype *xmlconcat(List *args);
  extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
--- 68,87 ----
  	XML_STANDALONE_OMITTED
  }	XmlStandaloneType;
  
! typedef enum
! {
! 	PG_XML_STRICTNESS_NONE		/* No error handling */,
! 	PG_XML_STRICTNESS_LEGACY	/* Ignore notices/warnings/errors unless
! 								 * function result indicates error condition
! 								 */,
! 	PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser notices/warnings/errors */,
! 	PG_XML_STRICTNESS_ALL		/* Report all notices/warnings/errors */,
! } PgXmlStrictness;
! 
! extern void pg_xml_init_library(void);
! extern void pg_xml_init(PgXmlStrictness strictness);
! extern void pg_xml_done(void);
! extern bool pg_xml_erroroccurred(void);
  extern void xml_ereport(int level, int sqlcode, const char *msg);
  extern xmltype *xmlconcat(List *args);
  extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index eaa5a74..364ce51 100644
*** a/src/test/regress/expected/xml.out
--- b/src/test/regress/expected/xml.out
*************** INSERT INTO xmltest VALUES (3, '<wrong')
*** 8,14 ****
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
  <wrong
        ^
  SELECT * FROM xmltest;
--- 8,14 ----
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
  <wrong
        ^
  SELECT * FROM xmltest;
*************** SELECT xmlconcat('bad', '<syntax');
*** 62,68 ****
  ERROR:  invalid XML content
  LINE 1: SELECT xmlconcat('bad', '<syntax');
                                  ^
! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag syntax line 1
  <syntax
         ^
  SELECT xmlconcat('<foo/>', NULL, '<?xml version="1.1" standalone="no"?><bar/>');
--- 62,68 ----
  ERROR:  invalid XML content
  LINE 1: SELECT xmlconcat('bad', '<syntax');
                                  ^
! DETAIL:  line 1: Couldn't find end of Start Tag syntax line 1
  <syntax
         ^
  SELECT xmlconcat('<foo/>', NULL, '<?xml version="1.1" standalone="no"?><bar/>');
*************** SELECT xmlparse(content '<abc>x</abc>');
*** 206,214 ****
   <abc>x</abc>
  (1 row)
  
  SELECT xmlparse(document 'abc');
  ERROR:  invalid XML document
! DETAIL:  Entity: line 1: parser error : Start tag expected, '<' not found
  abc
  ^
  SELECT xmlparse(document '<abc>x</abc>');
--- 206,259 ----
   <abc>x</abc>
  (1 row)
  
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: xmlParseEntityRef: no name
+ <invalidentity>&</invalidentity>
+                 ^
+ line 1: chunk is not well balanced
+ <invalidentity>&</invalidentity>
+                                 ^
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <undefinedentity>&idontexist;</undefinedentity>
+                              ^
+ line 1: chunk is not well balanced
+ <undefinedentity>&idontexist;</undefinedentity>
+                                                ^
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+          xmlparse          
+ ---------------------------
+  <invalidns xmlns='&lt;'/>
+ (1 row)
+ 
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+             xmlparse            
+ --------------------------------
+  <relativens xmlns='relative'/>
+ (1 row)
+ 
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  invalid XML content
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <twoerrors>&idontexist;</unbalanced>
+                        ^
+ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ line 1: chunk is not well balanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
+       xmlparse       
+ ---------------------
+  <nosuchprefix:tag/>
+ (1 row)
+ 
  SELECT xmlparse(document 'abc');
  ERROR:  invalid XML document
! DETAIL:  line 1: Start tag expected, '<' not found
  abc
  ^
  SELECT xmlparse(document '<abc>x</abc>');
*************** SELECT xmlparse(document '<abc>x</abc>')
*** 217,222 ****
--- 262,309 ----
   <abc>x</abc>
  (1 row)
  
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: xmlParseEntityRef: no name
+ <invalidentity>&</abc>
+                 ^
+ line 1: Opening and ending tag mismatch: invalidentity line 1 and abc
+ <invalidentity>&</abc>
+                       ^
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <undefinedentity>&idontexist;</abc>
+                              ^
+ line 1: Opening and ending tag mismatch: undefinedentity line 1 and abc
+ <undefinedentity>&idontexist;</abc>
+                                    ^
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+          xmlparse          
+ ---------------------------
+  <invalidns xmlns='&lt;'/>
+ (1 row)
+ 
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+             xmlparse            
+ --------------------------------
+  <relativens xmlns='relative'/>
+ (1 row)
+ 
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  invalid XML document
+ DETAIL:  line 1: Entity 'idontexist' not defined
+ <twoerrors>&idontexist;</unbalanced>
+                        ^
+ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
+ <twoerrors>&idontexist;</unbalanced>
+                                     ^
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
+       xmlparse       
+ ---------------------
+  <nosuchprefix:tag/>
+ (1 row)
+ 
  SELECT xmlpi(name foo);
    xmlpi  
  ---------
*************** SELECT '<>' IS NOT DOCUMENT;
*** 379,385 ****
  ERROR:  invalid XML content
  LINE 1: SELECT '<>' IS NOT DOCUMENT;
                 ^
! DETAIL:  Entity: line 1: parser error : StartTag: invalid element name
  <>
   ^
  SELECT xmlagg(data) FROM xmltest;
--- 466,472 ----
  ERROR:  invalid XML content
  LINE 1: SELECT '<>' IS NOT DOCUMENT;
                 ^
! DETAIL:  line 1: StartTag: invalid element name
  <>
   ^
  SELECT xmlagg(data) FROM xmltest;
*************** EXECUTE foo ('bad');
*** 425,431 ****
  ERROR:  invalid XML document
  LINE 1: EXECUTE foo ('bad');
                       ^
! DETAIL:  Entity: line 1: parser error : Start tag expected, '<' not found
  bad
  ^
  SET XML OPTION CONTENT;
--- 512,518 ----
  ERROR:  invalid XML document
  LINE 1: EXECUTE foo ('bad');
                       ^
! DETAIL:  line 1: Start tag expected, '<' not found
  bad
  ^
  SET XML OPTION CONTENT;
*************** SELECT xml_is_well_formed('<pg:foo xmlns
*** 679,684 ****
--- 766,801 ----
   t
  (1 row)
  
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+  xml_is_well_formed 
+ --------------------
+  t
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+  xml_is_well_formed 
+ --------------------
+  t
+ (1 row)
+ 
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
+  xml_is_well_formed 
+ --------------------
+  f
+ (1 row)
+ 
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
   xml_is_well_formed 
*************** SELECT xml_is_well_formed('abc');
*** 686,688 ****
--- 803,838 ----
   t
  (1 row)
  
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ -- Since different libxml versions emit slightly different
+ -- error messages, we suppress them
+ \set VERBOSITY terse
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ ERROR:  could not parse XML document
+ \set VERBOSITY default
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ ERROR:  could not parse XML document
+ DETAIL:  line 1: Namespace prefix nosuchprefix on tag is not defined
+ <nosuchprefix:tag/>
+                  ^
+ CONTEXT:  SQL function "xpath" statement 1
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+ WARNING:  line 1: xmlns: URI relative is not absolute
+ <relativens xmlns='relative'/>
+                             ^
+ CONTEXT:  SQL function "xpath" statement 1
+                 xpath                 
+ --------------------------------------
+  {"<relativens xmlns=\"relative\"/>"}
+ (1 row)
+ 
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 711b435..75e78a0 100644
*** a/src/test/regress/expected/xml_1.out
--- b/src/test/regress/expected/xml_1.out
*************** SELECT xmlparse(content '<abc>x</abc>');
*** 172,177 ****
--- 172,201 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SELECT xmlparse(document 'abc');
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
*************** SELECT xmlparse(document '<abc>x</abc>')
*** 180,185 ****
--- 204,233 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SELECT xmlpi(name foo);
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
*************** SELECT xml_is_well_formed('<pg:foo xmlns
*** 627,634 ****
--- 675,731 ----
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
+ ERROR:  unsupported XML feature
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
  ERROR:  unsupported XML feature
  DETAIL:  This functionality requires the server to be built with libxml support.
  HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ -- Since different libxml versions emit slightly different
+ -- error messages, we suppress them
+ \set VERBOSITY terse
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ ERROR:  unsupported XML feature at character 20
+ \set VERBOSITY default
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ ERROR:  unsupported XML feature
+ LINE 1: SELECT xpath('/*', '<nosuchprefix:tag/>');
+                            ^
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+ ERROR:  unsupported XML feature
+ LINE 1: SELECT xpath('/*', '<relativens xmlns=''relative''/>');
+                            ^
+ DETAIL:  This functionality requires the server to be built with libxml support.
+ HINT:  You need to rebuild PostgreSQL using --with-libxml.
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 717a1e7..b1a3a2c 100644
*** a/src/test/regress/sql/xml.sql
--- b/src/test/regress/sql/xml.sql
*************** SELECT xmlelement(name foo, xmlattribute
*** 62,70 ****
--- 62,82 ----
  
  SELECT xmlparse(content 'abc');
  SELECT xmlparse(content '<abc>x</abc>');
+ SELECT xmlparse(content '<invalidentity>&</invalidentity>');
+ SELECT xmlparse(content '<undefinedentity>&idontexist;</undefinedentity>');
+ SELECT xmlparse(content '<invalidns xmlns=''&lt;''/>');
+ SELECT xmlparse(content '<relativens xmlns=''relative''/>');
+ SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>');
+ SELECT xmlparse(content '<nosuchprefix:tag/>');
  
  SELECT xmlparse(document 'abc');
  SELECT xmlparse(document '<abc>x</abc>');
+ SELECT xmlparse(document '<invalidentity>&</abc>');
+ SELECT xmlparse(document '<undefinedentity>&idontexist;</abc>');
+ SELECT xmlparse(document '<invalidns xmlns=''&lt;''/>');
+ SELECT xmlparse(document '<relativens xmlns=''relative''/>');
+ SELECT xmlparse(document '<twoerrors>&idontexist;</unbalanced>');
+ SELECT xmlparse(document '<nosuchprefix:tag/>');
  
  
  SELECT xmlpi(name foo);
*************** SELECT xml_is_well_formed('<foo><bar>baz
*** 208,213 ****
--- 220,251 ----
  SELECT xml_is_well_formed('<local:data xmlns:local="http://127.0.0.1"><local:piece id="1">number one</local:piece><local:piece id="2" /></local:data>');
  SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</my:foo>');
  SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</pg:foo>');
+ SELECT xml_is_well_formed('<invalidentity>&</abc>');
+ SELECT xml_is_well_formed('<undefinedentity>&idontexist;</abc>');
+ SELECT xml_is_well_formed('<invalidns xmlns=''&lt;''/>');
+ SELECT xml_is_well_formed('<relativens xmlns=''relative''/>');
+ SELECT xml_is_well_formed('<twoerrors>&idontexist;</unbalanced>');
  
  SET xmloption TO CONTENT;
  SELECT xml_is_well_formed('abc');
+ 
+ -- Since xpath() deals with namespaces, it's a bit stricter about
+ -- what's well-formed and what's not. If we don't obey these rules
+ -- (i.e. ignore namespace-related errors from libxml), xpath()
+ -- fails in subtle ways. The following would for example produce
+ -- the xml value
+ --   <invalidns xmlns='<'/>
+ -- which is invalid beecause '<' may not appear un-escaped in
+ -- attribute values.
+ -- Since different libxml versions emit slightly different
+ -- error messages, we suppress them
+ \set VERBOSITY terse
+ SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
+ \set VERBOSITY default
+ 
+ -- Again, the XML isn't well-formed for namespace purposes
+ SELECT xpath('/*', '<nosuchprefix:tag/>');
+ 
+ -- XPath deprecates relative namespaces, but they're not supposed to
+ -- thrown an error, only a warning.
+ SELECT xpath('/*', '<relativens xmlns=''relative''/>');
#14Noah Misch
noah@leadboat.com
In reply to: Florian Pflug (#13)
Re: Another issue with invalid XML values

On Mon, Jun 27, 2011 at 12:45:02AM +0200, Florian Pflug wrote:

Updated patch attached. Do you think this is "Ready for Committer"?

Thanks. Yes; I have just marked it that way.

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#13)
Re: Another issue with invalid XML values

Florian Pflug <fgp@phlo.org> writes:

Updated patch attached. Do you think this is "Ready for Committer"?

I've been looking through this patch. While it's mostly good, I'm
pretty unhappy with the way that the pg_xml_init/pg_xml_done code is
deliberately designed to be non-reentrant (ie, throw an Assert if
pg_xml_init is called twice without pg_xml_done in between).
There are at least two issues with that:

1. If you forget pg_xml_done in some code path, you'll find out from
an Assert at the next pg_xml_init, which is probably far away from where
the actual problem is.

2. I don't think it's entirely unlikely that uses of libxml could be
nested.

xpath_table in particular calls an awful lot of stuff between
pg_xml_init and pg_xml_done, and is at the very least open to loss of
control via an elog before it's called pg_xml_done.

I think this patch has already paid 90% of the notational price for
supporting fully re-entrant use of libxml. What I'm imagining is
that we move all five of the static variables (xml_strictness,
xml_err_occurred, xml_err_buf, xml_structuredErrorFunc_saved,
xml_structuredErrorContext_saved) into a struct that's palloc'd
by pg_xml_init and eventually passed to pg_xml_done. It could be
passed to xml_errorHandler via the currently-unused context argument.
A nice side benefit is that we could get rid of PG_XML_STRICTNESS_NONE.

Now the risk factor if we do that is that if someone misses a
pg_xml_done call, we leave an error handler installed with a context
argument that's probably pointing at garbage, and if someone then tries
to use libxml without re-establishing their error handler, they've
got problems. But they'd have problems anyway with the current form of
the patch. We could provide some defense against this by including a
magic identifier value in the palloc'd struct and making
xml_errorHandler check it before doing anything dangerous. Also, we
could make pg_xml_done warn if libxml's current context pointer is
different from the struct passed to it, which would provide another
means of detection that somebody had missed a cleanup call.

Unless someone sees a major hole in this idea, or a better way to do it,
I'm going to modify the patch along those lines and commit.

regards, tom lane

#16Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#15)
Re: Another issue with invalid XML values

Excerpts from Tom Lane's message of mar jul 19 19:42:54 -0400 2011:

Now the risk factor if we do that is that if someone misses a
pg_xml_done call, we leave an error handler installed with a context
argument that's probably pointing at garbage, and if someone then tries
to use libxml without re-establishing their error handler, they've
got problems. But they'd have problems anyway with the current form of
the patch. We could provide some defense against this by including a
magic identifier value in the palloc'd struct and making
xml_errorHandler check it before doing anything dangerous. Also, we
could make pg_xml_done warn if libxml's current context pointer is
different from the struct passed to it, which would provide another
means of detection that somebody had missed a cleanup call.

Unless someone sees a major hole in this idea, or a better way to do it,
I'm going to modify the patch along those lines and commit.

I don't see any holes in this idea (though I didn't look very hard), but
I was thinking that maybe it's time for this module to hook onto the
cleanup stuff for the xact error case; or at least have a check that it
has been properly cleaned up elesewhere. Maybe this can be made to work
reentrantly if there's a global var holding the current context, and it
contains a link to the next one up the stack. At least, my impression
is that the PG_TRY blocks are already messy.

BTW I'd like to know your opinion on the fact that this patch added
two new StringInfo routines defined as static in xml.c. It seems to me
that if we're going to extend some module's API we should do it properly
in its own files; otherwise we're bound to repeat the functionality
elsewhere, and lose opportunities for cleaning up some other code that
could presumably use similar functionality.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#16)
Re: Another issue with invalid XML values

[ resend due to mail server hiccup ]

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of mar jul 19 19:42:54 -0400 2011:

Now the risk factor if we do that is that if someone misses a
pg_xml_done call, we leave an error handler installed with a context
argument that's probably pointing at garbage, and if someone then tries
to use libxml without re-establishing their error handler, they've
got problems.

I don't see any holes in this idea (though I didn't look very hard), but
I was thinking that maybe it's time for this module to hook onto the
cleanup stuff for the xact error case; or at least have a check that it
has been properly cleaned up elesewhere. Maybe this can be made to work
reentrantly if there's a global var holding the current context, and it
contains a link to the next one up the stack. At least, my impression
is that the PG_TRY blocks are already messy.

Yeah, that's another way we could go. But I'm not sure how well it
would interact with potential third-party modules setting up their own
libxml error handlers. Anybody have a thought about that?

BTW I'd like to know your opinion on the fact that this patch added
two new StringInfo routines defined as static in xml.c. It seems to me
that if we're going to extend some module's API we should do it properly
in its own files; otherwise we're bound to repeat the functionality
elsewhere, and lose opportunities for cleaning up some other code that
could presumably use similar functionality.

I did think about that for a little bit, but the functions in question
are only a couple lines long and seem rather specialized to what xml.c
needs. I'd just as soon leave them as-is until we actually have a
second use-case to help with picking a generalized API.

regards, tom lane

#18Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#15)
Re: Another issue with invalid XML values

On Jul20, 2011, at 01:42 , Tom Lane wrote:

Florian Pflug <fgp@phlo.org> writes:

Updated patch attached. Do you think this is "Ready for Committer"?

I've been looking through this patch. While it's mostly good, I'm
pretty unhappy with the way that the pg_xml_init/pg_xml_done code is
deliberately designed to be non-reentrant (ie, throw an Assert if
pg_xml_init is called twice without pg_xml_done in between).
There are at least two issues with that:

Hm, yeah, I did it that way because I didn't see an immediate need
to support nested pg_xml_init/done calls. But you're right - since
we're already nearly there, we might as well go all the way.

1. If you forget pg_xml_done in some code path, you'll find out from
an Assert at the next pg_xml_init, which is probably far away from where
the actual problem is.

Very true. In fact, I did miss one pg_xml_done() call in the xml2
contrib module initially, and it took me a while to locate the place
it was missing.

But won't me miss that error entirely if me make it re-entrant?

2. I don't think it's entirely unlikely that uses of libxml could be
nested.

xpath_table in particular calls an awful lot of stuff between
pg_xml_init and pg_xml_done, and is at the very least open to loss of
control via an elog before it's called pg_xml_done.

True. But note that this function's error handling leaves something
to be desired anyway - I think it'll leak memory if it elog()s, for
example.

I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
like the ones in core's xml.c do. The reasons I held back was I that
I felt these cleanups weren't part of the problem my patch was trying
to solve. At least according to our docs the xml2 contrib module is
also deprecated, which was another reason to make only the absolutely
necessary adjustments.

I think this patch has already paid 90% of the notational price for
supporting fully re-entrant use of libxml. What I'm imagining is
that we move all five of the static variables (xml_strictness,
xml_err_occurred, xml_err_buf, xml_structuredErrorFunc_saved,
xml_structuredErrorContext_saved) into a struct that's palloc'd
by pg_xml_init and eventually passed to pg_xml_done. It could be
passed to xml_errorHandler via the currently-unused context argument.
A nice side benefit is that we could get rid of PG_XML_STRICTNESS_NONE.

I'd marginally prefer for the caller, not pg_xml_init(), to be responsible
for allocating the struct. That gives the caller the option to simply
allocate it on the stack.

Now the risk factor if we do that is that if someone misses a
pg_xml_done call, we leave an error handler installed with a context
argument that's probably pointing at garbage, and if someone then tries
to use libxml without re-establishing their error handler, they've
got problems. But they'd have problems anyway with the current form of
the patch.

Currently it'd actually work in non-assert-enabled builds, so long
as no third-party library depends on its libxml error handler being
restored by us (which the old code simply assume never to be the case).

We could provide some defense against this by including a
magic identifier value in the palloc'd struct and making
xml_errorHandler check it before doing anything dangerous. Also, we
could make pg_xml_done warn if libxml's current context pointer is
different from the struct passed to it, which would provide another
means of detection that somebody had missed a cleanup call.

There's one additional hazard. Suppose you have a functions f() and
g() defined as

g() {
PgXmlState* xml_state = pg_xml_init();
...
/* Ups. No pg_xml_done() calll here */
}

f() {
PgXmlState* xml_state = pg_xml_init();

g();

xmlSomeLibXmlFunctionCall();
XML_CHECK_AND_EREPORT(xml_state, ...);

pg_xml_done();
}

Error reported by xmlSomeLibXmlFunctionCall() will now be
missed by the XML_CHECK_AND_EREPORT in f(), because they'll
modify g()'s xml_state, not f()'s.

So we really ought to make pg_xml_done() complain if libxml's
current error context isn't what it expects.

Unless someone sees a major hole in this idea, or a better way to do it,
I'm going to modify the patch along those lines and commit.

If pg_xml_done() and the error handler do the safety check you suggested,
it seems fine.

best regards,
Florian Pflug

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: Another issue with invalid XML values

I wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I was thinking that maybe it's time for this module to hook onto the
cleanup stuff for the xact error case; or at least have a check that it
has been properly cleaned up elesewhere. Maybe this can be made to work
reentrantly if there's a global var holding the current context, and it
contains a link to the next one up the stack. At least, my impression
is that the PG_TRY blocks are already messy.

Yeah, that's another way we could go. But I'm not sure how well it
would interact with potential third-party modules setting up their own
libxml error handlers. Anybody have a thought about that?

I thought a bit more about this, and realized that there's a big
obstacle to getting rid of the PG_TRY blocks this way: most of them are
responsible for telling libxml to free some data structures, not just
restoring the error handler state. We can't drop that aspect without
introducing session-lifespan memory leaks.

In principle we could expand the responsibilities of a
transaction-cleanup hook to include freeing data structures as well as
restoring error handlers, but the idea seems a lot messier and hence
less attractive than it did before. I was ready to get rid of the
PG_TRY blocks until I came across this problem, but now I think I'll
stick with them.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#18)
Re: Another issue with invalid XML values

Florian Pflug <fgp@phlo.org> writes:

On Jul20, 2011, at 01:42 , Tom Lane wrote:

1. If you forget pg_xml_done in some code path, you'll find out from
an Assert at the next pg_xml_init, which is probably far away from where
the actual problem is.

Very true. In fact, I did miss one pg_xml_done() call in the xml2
contrib module initially, and it took me a while to locate the place
it was missing.

But won't me miss that error entirely if me make it re-entrant?

Yeah, I don't see any very easy way to detect a missed pg_xml_done call,
but having it be an Assert failure some time later isn't good from a
robustness standpoint.

I'm of the opinion at this point that the most reliable solution is to
have a coding convention that pg_xml_init and pg_xml_done MUST be
called in the style

pg_xml_init();
PG_TRY();
...
PG_CATCH();
{
...
pg_xml_done();
PG_RE_THROW();
}
PG_END_TRY();
pg_xml_done();

If we convert contrib/xml2 over to this style, we can get rid of some of
the weirder aspects of the LEGACY mode, such as allowing xml_ereport to
occur after pg_xml_done (which would be problematic for my proposal,
since I want pg_xml_done to pfree the state including the message
buffer).

I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
like the ones in core's xml.c do. The reasons I held back was I that
I felt these cleanups weren't part of the problem my patch was trying
to solve.

Fair point, but contorting the error handling to avoid changing xml2/ a
bit more doesn't seem like a good decision to me. It's not like we
aren't forcing some change on that module already.

So we really ought to make pg_xml_done() complain if libxml's
current error context isn't what it expects.

Right, got that coded already.

regards, tom lane

#21Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#20)
Re: Another issue with invalid XML values

On Jul20, 2011, at 17:08 , Tom Lane wrote:

Florian Pflug <fgp@phlo.org> writes:

On Jul20, 2011, at 01:42 , Tom Lane wrote:

1. If you forget pg_xml_done in some code path, you'll find out from
an Assert at the next pg_xml_init, which is probably far away from where
the actual problem is.

But won't me miss that error entirely if me make it re-entrant?

I'm of the opinion at this point that the most reliable solution is to
have a coding convention that pg_xml_init and pg_xml_done MUST be
called in the style

pg_xml_init();
PG_TRY();
...
PG_CATCH();
{
...
pg_xml_done();
PG_RE_THROW();
}
PG_END_TRY();
pg_xml_done();

If we convert contrib/xml2 over to this style, we can get rid of some of
the weirder aspects of the LEGACY mode, such as allowing xml_ereport to
occur after pg_xml_done

Fine with me.

(which would be problematic for my proposal,
since I want pg_xml_done to pfree the state including the message
buffer).

I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
pfree it, but I'm kinda curious about why you prefer that over making it
the callers responsibility and letting callers use a stack-allocated
struct if they wish to.

I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
like the ones in core's xml.c do. The reasons I held back was I that
I felt these cleanups weren't part of the problem my patch was trying
to solve.

Fair point, but contorting the error handling to avoid changing xml2/ a
bit more doesn't seem like a good decision to me. It's not like we
aren't forcing some change on that module already.

Fair enough. Are you going to do that, or do you want me to produce an
updated patch? I can do that, but probably not before the weekend.

best regards,
Florian Pflug

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#21)
Re: Another issue with invalid XML values

Florian Pflug <fgp@phlo.org> writes:

I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
pfree it, but I'm kinda curious about why you prefer that over making it
the callers responsibility and letting callers use a stack-allocated
struct if they wish to.

We could do it that way, but it would require exposing the struct
definition to callers. As I have it coded ATM, the struct is an
opaque typedef in xml.h and only known within xml.c, which decouples
contrib/xml2 from any changes in it. Another point is that if we
changed our minds and went over to a transaction cleanup hook,
stack-allocated structs wouldn't work at all. Lastly, even if we
did stack-allocate the control struct, the message buffer has to be
palloc'd so it can be expanded at need.

Fair enough. Are you going to do that, or do you want me to produce an
updated patch? I can do that, but probably not before the weekend.

No, I'm working on it, almost done already.

regards, tom lane

#23Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#22)
Re: Another issue with invalid XML values

On Jul20, 2011, at 17:37 , Tom Lane wrote:

Florian Pflug <fgp@phlo.org> writes:

I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
pfree it, but I'm kinda curious about why you prefer that over making it
the callers responsibility and letting callers use a stack-allocated
struct if they wish to.

We could do it that way, but it would require exposing the struct
definition to callers. As I have it coded ATM, the struct is an
opaque typedef in xml.h and only known within xml.c, which decouples
contrib/xml2 from any changes in it. Another point is that if we
changed our minds and went over to a transaction cleanup hook,
stack-allocated structs wouldn't work at all. Lastly, even if we
did stack-allocate the control struct, the message buffer has to be
palloc'd so it can be expanded at need.

You've convinced me, thanks for the detailed explanation!

Fair enough. Are you going to do that, or do you want me to produce an
updated patch? I can do that, but probably not before the weekend.

No, I'm working on it, almost done already.

Cool, thanks!

best regards,
Florian Pflug

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#21)
Re: Another issue with invalid XML values

I've committed this patch with the discussed changes and some other
editorialization. I have to leave for an appointment and can't write
anything now about the changes, but feel free to ask questions if you
have any.

regards, tom lane

#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
Re: Another issue with invalid XML values

Tom Lane wrote:

I've committed this patch with the discussed changes and some other
editorialization. I have to leave for an appointment and can't write
anything now about the changes, but feel free to ask questions if you
have any.

Did this fix any of the XML TODOs?

http://wiki.postgresql.org/wiki/Todo#XML

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#25)
Re: Another issue with invalid XML values

Bruce Momjian <bruce@momjian.us> writes:

Did this fix any of the XML TODOs?
http://wiki.postgresql.org/wiki/Todo#XML

No, not that I know of.

regards, tom lane

#27Bernd Helmle
mailings@oopsware.de
In reply to: Tom Lane (#24)
Re: Another issue with invalid XML values

--On 20. Juli 2011 13:06:17 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've committed this patch with the discussed changes and some other
editorialization. I have to leave for an appointment and can't write
anything now about the changes, but feel free to ask questions if you
have any.

Hmm, when building against libxml2 2.7.8 i get reproducible failing regression
tests on OSX 10.6.7. It is griping with

WARNING: libxml error handling state is out of sync with xml.c

all over the place.

A quick check with compiling against the libxml2 shipped with OSX (which seems
libxml2 2.7.3) causes everything to work as expected, however.

--
Thanks

Bernd

#28Florian Pflug
fgp@phlo.org
In reply to: Bernd Helmle (#27)
Re: Another issue with invalid XML values

On Jul25, 2011, at 18:53 , Bernd Helmle wrote:

--On 20. Juli 2011 13:06:17 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've committed this patch with the discussed changes and some other
editorialization. I have to leave for an appointment and can't write
anything now about the changes, but feel free to ask questions if you
have any.

Hmm, when building against libxml2 2.7.8 i get reproducible failing
regression tests on OSX 10.6.7. It is griping with

WARNING: libxml error handling state is out of sync with xml.c

all over the place.

A quick check with compiling against the libxml2 shipped with OSX
(which seems libxml2 2.7.3) causes everything to work as expected, however.

Hm, I have libxml2 2.7.8, installed via Mac Ports, and I cannot reproduce
this. Maybe Mac Ports uses a modified libxml2, though. I'll check that.

Where did you obtain libxml2 from?

best regards,
Florian Pflug

#29Bernd Helmle
mailings@oopsware.de
In reply to: Florian Pflug (#28)
Re: Another issue with invalid XML values

--On 25. Juli 2011 19:07:50 +0200 Florian Pflug <fgp@phlo.org> wrote:

Hm, I have libxml2 2.7.8, installed via Mac Ports, and I cannot reproduce
this. Maybe Mac Ports uses a modified libxml2, though. I'll check that.

Where did you obtain libxml2 from?

This is MacPorts, too:

% port installed libxml2
The following ports are currently installed:
libxml2 @2.7.8_0 (active)

I've reduced my configure line to the least required options

./configure --with-libxml --with-includes=/opt/local/include
--with-libraries=/opt/local/lib

but still get the WARNINGs in the regression.diffs. Which settings do you use?

--
Thanks

Bernd

#30Florian Pflug
fgp@phlo.org
In reply to: Bernd Helmle (#29)
Re: Another issue with invalid XML values

On Jul25, 2011, at 19:37 , Bernd Helmle wrote:

--On 25. Juli 2011 19:07:50 +0200 Florian Pflug <fgp@phlo.org> wrote:

Hm, I have libxml2 2.7.8, installed via Mac Ports, and I cannot reproduce
this. Maybe Mac Ports uses a modified libxml2, though. I'll check that.

Where did you obtain libxml2 from?

This is MacPorts, too:

% port installed libxml2
The following ports are currently installed:
libxml2 @2.7.8_0 (active)

'bout the same here:

$ port installed libxml2
The following ports are currently installed:
libxml2 @2.7.8_0+universal (active)

I've reduced my configure line to the least required options

./configure --with-libxml --with-includes=/opt/local/include --with-libraries=/opt/local/lib

but still get the WARNINGs in the regression.diffs.

I got a theory. We do distinguish between libxml2 versions for which
the structured and the generic error context handler share the error
context (older ones), and those with don't (newer ones). Our configure
scripts checks for the availability of xmlStructuredErrorContext, and
defined HAVE_XMLSTRUCTUREDERRORCONTEXT if it is. Now, if for some reason
that test fails on your machine, even though libxml *does* provide
xmlStructuredErrorContext, then the safety-check in the error handler
would check whether xmlGenericErrorContext is set as expected, when
it really should check xmlStructuredErrorContext.

Could you check if configure defines that macro? You should find
it in the pg_config.h generated by configure.

Which settings do you use?

configure \
--prefix=/Users/fgp/Installs/pg.master.max.noassert.O1 \
--with-includes=/opt/local/include \
--with-libraries=/opt/local/lib \
--enable-debug \
--enable-depend \
--enable-thread-safety \
--with-pgport=54320 \
--without-tcl \
--with-perl \
--with-python \
--without-gssapi \
--without-krb5 \
--without-pam \
--without-ldap \
--without-bonjour \
--without-openssl \
--without-ossp-uuid \
--with-libxml \
--with-libxslt CFLAGS="-pipe -O1 -g"

I also checked with otool -L that it really uses the libxml from /opt.

$ otool -L .//src/test/regress/tmp_check/install/Users/fgp/Installs/pg.master.max.noassert.O1/bin/postgres
.//src/test/regress/tmp_check/install/Users/fgp/Installs/pg.master.max.noassert.O1/bin/postgres:
/opt/local/lib/libxml2.2.dylib (compatibility version 10.0.0, current version 10.8.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.11)

Despite the file name, that should be libxml 2.7.8. Here's the output of
xml2-config

$ /opt/local/bin/xml2-config --version
2.7.8

And there's no other libxml2 in /opt.

best regards,
Florian Pflug

#31Bernd Helmle
mailings@oopsware.de
In reply to: Florian Pflug (#30)
Re: Another issue with invalid XML values

--On 25. Juli 2011 19:57:40 +0200 Florian Pflug <fgp@phlo.org> wrote:

I got a theory. We do distinguish between libxml2 versions for which
the structured and the generic error context handler share the error
context (older ones), and those with don't (newer ones). Our configure
scripts checks for the availability of xmlStructuredErrorContext, and
defined HAVE_XMLSTRUCTUREDERRORCONTEXT if it is. Now, if for some reason
that test fails on your machine, even though libxml *does* provide
xmlStructuredErrorContext, then the safety-check in the error handler
would check whether xmlGenericErrorContext is set as expected, when
it really should check xmlStructuredErrorContext.

Could you check if configure defines that macro? You should find
it in the pg_config.h generated by configure.

This is what pg_config.h says:

% grep HAVE_XMLSTRUCTUREDERRORCONTEXT src/include/pg_config.h
/* #undef HAVE_XMLSTRUCTUREDERRORCONTEXT */

Ah, but i got now what's wrong here: configure is confusing both libxml2
installations, and a quick look into config.log proves that: it uses the
xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
MacPorts, though i seem to recall to have changed this in the past....).

So, all i need to do is

XML2_CONFIG=/opt/local/bin/xml2-config ./configure --with-libxml
--with-includes=/opt/local/include/ --with-libraries=/opt/local/lib

and everything is smooth:

% grep HAVE_XMLSTRUCTUREDERRORCONTEXT src/include/pg_config.h#define
HAVE_XMLSTRUCTUREDERRORCONTEXT 1

Regression tests passes now. This was too obvious...

--
Thanks

Bernd

#32Florian Pflug
fgp@phlo.org
In reply to: Bernd Helmle (#31)
Re: Another issue with invalid XML values

On Jul25, 2011, at 20:37 , Bernd Helmle wrote:

Ah, but i got now what's wrong here: configure is confusing both libxml2
installations, and a quick look into config.log proves that: it uses the
xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
MacPorts, though i seem to recall to have changed this in the past....).

So, all i need to do is

XML2_CONFIG=/opt/local/bin/xml2-config ./configure --with-libxml
--with-includes=/opt/local/include/ --with-libraries=/opt/local/lib

and everything is smooth:

% grep HAVE_XMLSTRUCTUREDERRORCONTEXT src/include/pg_config.h
#define HAVE_XMLSTRUCTUREDERRORCONTEXT 1

Regression tests passes now. This was too obvious...

Hm, but I still think there's a bug lurking there. Using a different libxml2
version for the configure checks than for actual builds surely isn't good...

From looking at configure.in, it seems that we use xml2-config to figure out
the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
somehow end up not using these flags when we later test for
xmlStructuredErrorContext, but do use them during the actual build. Or maybe
the order of the -I and -L flags just ends up being different in the two cases.

My skills in the black art that are autotools are severely lacking, so it's
quite likely that I somehow botched the incantations we use to test for
xmlStructuredErrorContext. I don't really know where to start looking for the
error, though. Ideas, anyone?

best regards,
Florian Pflug

#33Noah Misch
noah@2ndQuadrant.com
In reply to: Florian Pflug (#32)
Re: Another issue with invalid XML values

On Mon, Jul 25, 2011 at 09:06:41PM +0200, Florian Pflug wrote:

On Jul25, 2011, at 20:37 , Bernd Helmle wrote:

Ah, but i got now what's wrong here: configure is confusing both libxml2
installations, and a quick look into config.log proves that: it uses the
xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
MacPorts, though i seem to recall to have changed this in the past....).

Hm, but I still think there's a bug lurking there. Using a different libxml2
version for the configure checks than for actual builds surely isn't good...

From looking at configure.in, it seems that we use xml2-config to figure out
the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
somehow end up not using these flags when we later test for
xmlStructuredErrorContext, but do use them during the actual build. Or maybe
the order of the -I and -L flags just ends up being different in the two cases.

I can reproduce similar behavior on GNU/Linux. If my setup was sufficiently
similar, Bernd's problematic build would have used this sequence of directives
during both configuration and build:

-I/usr/include/libxml2 -I/opt/local/include -L/opt/local/lib

The directories passed using --with-includes and --with-libraries took
priority over those from xml2-config. Since libxml2 headers live in a
`libxml2' subdirectory, --with-includes=/opt/local/include did not affect
finding them. --with-libraries=/opt/local/lib *did* affect finding the
library binaries, though. Therefore, he built entirely against /usr headers
and /opt/local libraries. We could rearrange things so the xml2-config -L
flags (or lack thereof) take priority over a --with-libraries directory for
the purpose of finding libxml2.

As a side note, we don't add an rpath for libxml2 like we do for Perl and
Python. That doesn't matter on Darwin, but with GNU libc, it entails setting
LD_LIBRARY_PATH or updating /etc/ld.so.conf to make the run time linker find
the library binary used at build time.

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#34Florian Pflug
fgp@phlo.org
In reply to: Noah Misch (#33)
Re: Another issue with invalid XML values

On Jul26, 2011, at 01:57 , Noah Misch wrote:

On Mon, Jul 25, 2011 at 09:06:41PM +0200, Florian Pflug wrote:

On Jul25, 2011, at 20:37 , Bernd Helmle wrote:

Ah, but i got now what's wrong here: configure is confusing both libxml2
installations, and a quick look into config.log proves that: it uses the
xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
MacPorts, though i seem to recall to have changed this in the past....).

Hm, but I still think there's a bug lurking there. Using a different libxml2
version for the configure checks than for actual builds surely isn't good...

From looking at configure.in, it seems that we use xml2-config to figure out
the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
somehow end up not using these flags when we later test for
xmlStructuredErrorContext, but do use them during the actual build. Or maybe
the order of the -I and -L flags just ends up being different in the two cases.

I can reproduce similar behavior on GNU/Linux. If my setup was sufficiently
similar, Bernd's problematic build would have used this sequence of directives
during both configuration and build:

-I/usr/include/libxml2 -I/opt/local/include -L/opt/local/lib

The directories passed using --with-includes and --with-libraries took
priority over those from xml2-config. Since libxml2 headers live in a
`libxml2' subdirectory, --with-includes=/opt/local/include did not affect
finding them. --with-libraries=/opt/local/lib *did* affect finding the
library binaries, though. Therefore, he built entirely against /usr headers
and /opt/local libraries.

Right. Thanks for detailed analysis, Noah!

What's more, xml.c actually does attempt to protect against this situation
by calling LIBXML_TEST_VERSION in pg_xml_init_library().

But that check doesn't fire in our case, because it explicitly allows the
actual library version to be newer than the header file version, as long
as the major versions agree (the major version being "2" in this case).

So in essence, libxml promises ABI backwards-compatibility, but breaks
that promise when it comes to restoring the structured error context.

The only fix I can come up with is to explicitly test whether or not we're
able to restore the structured error context in pg_xml_init_library(). I'm
envisioning we'd so something like

xmlStructuredErrorFunc *saved_errfunc = xmlStructuredError;
#if HAVE_XMLSTRUCTUREDERRORCONTEXT
void *saved_errctx = xmlStructuredErrorContext;
#else
void *saved_errctx = xmlGenericErrorContext;
#endif

xmlSetStructuredErrorFunc((void *) MAGIC, xml_errorHandler);

#ifdef HAVE_XMLSTRUCTUREDERRORCONTEXT
if (xmlStructuredErrorContext != MAGIC)
#else
if (xmlGenericErrorContext != MAGIC)
#endif
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("unable to restore the libxml error context"),
errhint("Postgres was probably built against libxml < 2.7.4 but loaded a version >= 2.7.4 at runtime")))
}

xmlSetStructuredErrorFunc(saved_errctx, saved_errfunc);

The downside of this is that a libxml version upgrade might break postgres,
of course. But by the time postgres 9.1 is released, 2.7.4 will be nearly 3
years old, so I dunno how like it is that a distro would afterwards decide to
upgrade from a version earlier than that to a version later than that.

We could rearrange things so the xml2-config -L
flags (or lack thereof) take priority over a --with-libraries directory for
the purpose of finding libxml2.

Hm, how would we do that just for the purpose of finding libxml? Do autotools
provide a way to specify per-library -L flags?

As a side note, we don't add an rpath for libxml2 like we do for Perl and
Python. That doesn't matter on Darwin, but with GNU libc, it entails setting
LD_LIBRARY_PATH or updating /etc/ld.so.conf to make the run time linker find
the library binary used at build time.

Sounds like we should change that.

best regards,
Florian Pflug

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#34)
Re: Another issue with invalid XML values

Florian Pflug <fgp@phlo.org> writes:

What's more, xml.c actually does attempt to protect against this situation
by calling LIBXML_TEST_VERSION in pg_xml_init_library().

But that check doesn't fire in our case, because it explicitly allows the
actual library version to be newer than the header file version, as long
as the major versions agree (the major version being "2" in this case).

So in essence, libxml promises ABI backwards-compatibility, but breaks
that promise when it comes to restoring the structured error context.

Fun.

The only fix I can come up with is to explicitly test whether or not we're
able to restore the structured error context in pg_xml_init_library().

Yeah, I think you are right.

The downside of this is that a libxml version upgrade might break postgres,

Such an upgrade would break Postgres anyway --- better we are able to
detect and report it clearly than that we fail in bizarre ways.

As a side note, we don't add an rpath for libxml2 like we do for Perl and
Python.

Sounds like we should change that.

I don't think so. It would just be another headache for packagers ---
in Red Hat distros, at least, packages are explicitly forbidden from
containing any rpath settings.

regards, tom lane

#36Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#35)
Re: Another issue with invalid XML values

On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

As a side note, we don't add an rpath for libxml2 like we do for Perl and
Python.

Sounds like we should change that.

I don't think so.  It would just be another headache for packagers ---
in Red Hat distros, at least, packages are explicitly forbidden from
containing any rpath settings.

So what do they do about Perl and Python?

Not sure I understand the virtue of being inconsistent here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#35)
Re: Another issue with invalid XML values

On Jul26, 2011, at 15:52 , Tom Lane wrote:

Florian Pflug <fgp@phlo.org> writes:

The only fix I can come up with is to explicitly test whether or not we're
able to restore the structured error context in pg_xml_init_library().

Yeah, I think you are right.

The downside of this is that a libxml version upgrade might break postgres,

Such an upgrade would break Postgres anyway --- better we are able to
detect and report it clearly than that we fail in bizarre ways.

On further reflection, instead of checking whether we can restore the error
handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
in pg_xml_done() to ereport(ERROR), and include a hint that explains the
probably cause.

The upside being that we only fail when we actually need to restore the
error handler. Since there's one caller (parse_xml_decl) who calls
pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
isn't only academic. At least XML would output will continue to work
work after libxml is upgraded from < 2.7.4 to >= 2.7.4.

If nobody objects, I'll post a patch to that effect later today.

BTW, come to think of it, I believe the elog(ERROR) that you put into
xml_errorHandler should to be upgraded to elog(FATAL). Once we longjmp()
out of libxml, it doesn't seem safe to re-enter it, so making the backend
exit seems to be the only safe thing to do.

best regards,
Florian Pflug

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#37)
Re: Another issue with invalid XML values

Florian Pflug <fgp@phlo.org> writes:

On further reflection, instead of checking whether we can restore the error
handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
in pg_xml_done() to ereport(ERROR), and include a hint that explains the
probably cause.

The upside being that we only fail when we actually need to restore the
error handler. Since there's one caller (parse_xml_decl) who calls
pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
isn't only academic. At least XML would output will continue to work
work after libxml is upgraded from < 2.7.4 to >= 2.7.4.

Good point. But what about failing in pg_xml_init? That is, after
calling xmlSetStructuredErrorFunc, check that it set the variable we
expected it to set.

The purpose of the check in pg_xml_done is not to detect library issues,
but to detect omitted save/restore pairs and similar coding mistakes.
I don't want to upgrade it to an ERROR, and I don't want to confuse
people by hinting that the problem is with libxml.

regards, tom lane

#39Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#38)
Re: Another issue with invalid XML values

On Jul26, 2011, at 16:22 , Tom Lane wrote:

Florian Pflug <fgp@phlo.org> writes:

On further reflection, instead of checking whether we can restore the error
handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
in pg_xml_done() to ereport(ERROR), and include a hint that explains the
probably cause.

The upside being that we only fail when we actually need to restore the
error handler. Since there's one caller (parse_xml_decl) who calls
pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
isn't only academic. At least XML would output will continue to work
work after libxml is upgraded from < 2.7.4 to >= 2.7.4.

Good point. But what about failing in pg_xml_init? That is, after
calling xmlSetStructuredErrorFunc, check that it set the variable we
expected it to set.

Yeah, that's even better. Will do it that way.

What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
to elog(FATAL)? Shall I do that too, or leave it for now?

The purpose of the check in pg_xml_done is not to detect library issues,
but to detect omitted save/restore pairs and similar coding mistakes.
I don't want to upgrade it to an ERROR, and I don't want to confuse
people by hinting that the problem is with libxml.

I can see the concern about possible confusion, and I agree that
pg_xml_init(), right after we set our error handler, is the most logical
place to test whether we can read it back.

I guess I don't really see the benefit of the check in pg_xml_done()
triggering a WARNING instead of an ERROR, but since we won't be hijacking
that check now anyway, I don't mind it being a WARNING either.

best regards,
Florian Pflug

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#36)
Re: Another issue with invalid XML values

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't think so. �It would just be another headache for packagers ---
in Red Hat distros, at least, packages are explicitly forbidden from
containing any rpath settings.

So what do they do about Perl and Python?

Patch the source to not add rpath switches.

regards, tom lane

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#39)
Re: Another issue with invalid XML values

Florian Pflug <fgp@phlo.org> writes:

What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
to elog(FATAL)? Shall I do that too, or leave it for now?

No objection here --- I had considered writing it that way myself.
I refrained for fear of making a bad situation even worse, but if
other people think FATAL would be the safer way, fine.

regards, tom lane

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#40)
Re: Another issue with invalid XML values

Tom Lane <tgl@sss.pgh.pa.us> writes:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't think so. �It would just be another headache for packagers ---
in Red Hat distros, at least, packages are explicitly forbidden from
containing any rpath settings.

So what do they do about Perl and Python?

Patch the source to not add rpath switches.

No, I take that back. What the RH packages do is configure with
--disable-rpath, and so long as that applies to this too, I'd
have no objection.

What I was mis-remembering is that there's a patch that *puts back*
plperl's rpath for libperl, because for some reason that no one has ever
satisfactorily explained to me, perl has an exemption from the distro
policy that loadable libraries must be installed so that they're known
to ldconfig. Which of course is exactly why rpath isn't (supposed to
be) needed.

regards, tom lane

#43Noah Misch
noah@2ndQuadrant.com
In reply to: Florian Pflug (#34)
Re: Another issue with invalid XML values

On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:

On Jul26, 2011, at 01:57 , Noah Misch wrote:

We could rearrange things so the xml2-config -L
flags (or lack thereof) take priority over a --with-libraries directory for
the purpose of finding libxml2.

Hm, how would we do that just for the purpose of finding libxml? Do autotools
provide a way to specify per-library -L flags?

Autoconf (built-in macros, that is) and Automake do not get involved in that. I
vaguely recall Libtool having support for it, but PostgreSQL does not use
Automake or Libtool. I also spoke too loosely: -L is never per-library, but you
can emulate that by completing the search externally and passing a full path to
the linker.

Honestly, the benefit is probably too light to justify going to this trouble.

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#44Florian Pflug
fgp@phlo.org
In reply to: Noah Misch (#43)
Re: Another issue with invalid XML values

On Jul26, 2011, at 18:04 , Noah Misch wrote:

On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:

On Jul26, 2011, at 01:57 , Noah Misch wrote:

We could rearrange things so the xml2-config -L
flags (or lack thereof) take priority over a --with-libraries directory for
the purpose of finding libxml2.

Hm, how would we do that just for the purpose of finding libxml? Do autotools
provide a way to specify per-library -L flags?

Autoconf (built-in macros, that is) and Automake do not get involved in that. I
vaguely recall Libtool having support for it, but PostgreSQL does not use
Automake or Libtool. I also spoke too loosely: -L is never per-library, but you
can emulate that by completing the search externally and passing a full path to
the linker.

Yeah, that was what I though would be the only way too. I had the slight hope
that I had missed something, but unfortunately it seems I didn't :-(

Honestly, the benefit is probably too light to justify going to this trouble.

Yep. It'd probably be weeks, if not months, until we made that work on all
supported platforms. And by the time it does, we'd probably have re-invented
a sizeable portion of libtool.

best regards,
Florian Pflug

#45Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#41)
1 attachment(s)
Re: Another issue with invalid XML values

On Jul26, 2011, at 17:07 , Tom Lane wrote:

Florian Pflug <fgp@phlo.org> writes:

What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
to elog(FATAL)? Shall I do that too, or leave it for now?

No objection here --- I had considered writing it that way myself.
I refrained for fear of making a bad situation even worse, but if
other people think FATAL would be the safer way, fine.

Patch attached.

pg_xml_init() now checks the error context after setting it, and
ereport(ERROR)s if the check fails. I've made it so that the errhint()
which suggest that compiling against libxml <= 2.7.3 but running
against > 2.7.3 might be the reason is only emitted if we actually
compiled against an older version. This is meant to avoid confusion
should there ever be another reason for that check to fail.

I've also changed the elog(ERROR) to elog(FATAL) in xml_errorHandler().

I've pondered whether to add a check to configure which verifies that
the headers match the libxml version exactly at compile time. In the end,
I didn't do that, for two reasons. First, there isn't anything wrong with
using older headers together with a newer libxml, so long as both versions
are either <= 2.7.3 or > 2.7.3. And second, with such a check in place,
the only way to exercise libxml's promised ABI compatibility is to upgrade
the libxml 2package after compiling postgres. That, however, is unlikely
to happen except on production servers, and so by adding the check we'd
increase the chance of ABI-compatibility problems remaining undetected
during development and testing.

best regards,
Florian Pflug

Attachments:

pg_xml_errctxcheck.patchapplication/octet-stream; name=pg_xml_errctxcheck.patchDownload
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index f3db3f0..94ad656 100644
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
*************** PgXmlErrorContext *
*** 928,933 ****
--- 928,934 ----
  pg_xml_init(PgXmlStrictness strictness)
  {
  	PgXmlErrorContext *errcxt;
+ 	void	   *cur_errcxt;
  
  	/* Do one-time setup if needed */
  	pg_xml_init_library();
*************** pg_xml_init(PgXmlStrictness strictness)
*** 956,961 ****
--- 957,991 ----
  
  	xmlSetStructuredErrorFunc((void *) errcxt, xml_errorHandler);
  
+ 	/*
+ 	 * Verify that we're able to read back the error context we just set.
+ 	 * If this fails, the error context we saved above is probably equally
+ 	 * bogus, and since that leave us without a way to restore the context
+ 	 * in pg_xml_done(), we complain early and loudly. The only known
+ 	 * situation in which we fail this test is if we compile with headers
+ 	 * from a libxml2 which doesn't track the structured error context
+ 	 * separately (<= 2.7.3), but (dynamically) link a version that does.
+ 	 */
+ 
+ #ifdef HAVE_XMLSTRUCTUREDERRORCONTEXT
+ 	cur_errcxt = xmlStructuredErrorContext;
+ #else
+ 	cur_errcxt = xmlGenericErrorContext;
+ #endif
+ 	if (cur_errcxt != (void *) errcxt)
+ 	{
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED)
+ 				,errmsg("could not setup XML error handler")
+ 				,errdetail("libxml2 error context could not be queried")
+ #ifndef HAVE_XMLSTRUCTUREDERRORCONTEXT
+ 				,errhint("Postgres was built with a libxml version without "
+ 						 "xmlStructuredErrorContext but is probably now using "
+ 						 "a version with it. This situation is unsupported")
+ #endif
+ 		));
+ 	}
+ 
  	return errcxt;
  }
  
*************** xml_errorHandler(void *data, xmlErrorPtr
*** 1494,1502 ****
  	int			level = error->level;
  	StringInfo	errorBuf;
  
! 	/* Defend against someone passing us a bogus context struct */
  	if (xmlerrcxt->magic != ERRCXT_MAGIC)
! 		elog(ERROR, "xml_errorHandler called with invalid PgXmlErrorContext");
  
  	/*----------
  	 * Older libxml versions report some errors differently.
--- 1524,1536 ----
  	int			level = error->level;
  	StringInfo	errorBuf;
  
! 	/*
! 	 * Defend against someone passing us a bogus context struct.
! 	 * We force a backend exit if this check fails because longjmp()ing
! 	 * out of libxml might render it unuseable.
! 	 */
  	if (xmlerrcxt->magic != ERRCXT_MAGIC)
! 		elog(FATAL, "xml_errorHandler called with invalid PgXmlErrorContext");
  
  	/*----------
  	 * Older libxml versions report some errors differently.
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Pflug (#45)
Re: Another issue with invalid XML values

Florian Pflug <fgp@phlo.org> writes:

Patch attached.

Will check and apply this.

I've pondered whether to add a check to configure which verifies that
the headers match the libxml version exactly at compile time. In the end,
I didn't do that, for two reasons. First, there isn't anything wrong with
using older headers together with a newer libxml, so long as both versions
are either <= 2.7.3 or > 2.7.3. And second, with such a check in place,
the only way to exercise libxml's promised ABI compatibility is to upgrade
the libxml 2package after compiling postgres. That, however, is unlikely
to happen except on production servers, and so by adding the check we'd
increase the chance of ABI-compatibility problems remaining undetected
during development and testing.

I concur with *not* doing that.

regards, tom lane