pg_parse_json() should not leak token copies on failure

Started by Jacob Championover 1 year ago10 messages
#1Jacob Champion
jacob.champion@enterprisedb.com
1 attachment(s)

Hi,

When a client of our JSON parser registers semantic action callbacks,
the parser will allocate copies of the lexed tokens to pass into those
callbacks. The intent is for the callbacks to always take ownership of
the copied memory, and if they don't want it then they can pfree() it.

However, if parsing fails on the token before the callback is invoked,
that allocation is leaked. That's not a problem for the server side,
or for clients that immediately exit on parse failure, but if the JSON
parser gets added to libpq for OAuth, it will become more of a
problem.

(I'm building a fuzzer to flush out these sorts of issues; not only
does it complain loudly about the leaks, but the accumulation of
leaked memory puts a hard limit on how long a fuzzer run can last. :D)

Attached is a draft patch to illustrate what I mean, but it's
incomplete: it only solves the problem for scalar values. We also make
a copy of object field names, which is much harder to fix, because we
make only a single copy and pass that to both the object_field_start
and object_field_end callbacks. Consider:

- If a client only implements object_field_start, it takes ownership
of the field name when we call it. It can free the allocation if it
decides that the field is irrelevant.
- The same is true for clients that only implement object_field_end.
- If a client implements both callbacks, it takes ownership of the
field name when we call object_field_start. But irrelevant field names
can't be freed during that callback, because the same copy is going to
be passed to object_field_end. And object_field_end isn't guaranteed
to be called, because parsing could fail for any number of reasons
between now and then. So what code should be responsible for cleanup?
The parser doesn't know whether the callback already freed the pointer
or kept a reference to it in semstate.

Any thoughts on how we can improve this? I was thinking we could maybe
make two copies of the field name and track ownership individually?

Thanks,
--Jacob

Attachments:

0001-WIP-fix-leak-of-scalar-value-on-lex-failure.patchapplication/octet-stream; name=0001-WIP-fix-leak-of-scalar-value-on-lex-failure.patchDownload
From 70e4c02beb73b92f5c85a4d8f82ad938d7a075b5 Mon Sep 17 00:00:00 2001
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Thu, 25 Apr 2024 15:26:40 -0700
Subject: [PATCH] WIP: fix leak of scalar value on lex failure

TODO: field names
---
 src/common/jsonapi.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 6633503490..6339e909a4 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -380,10 +380,9 @@ makeJsonLexContextIncremental(JsonLexContext *lex, int encoding,
 	lex->incremental = true;
 	lex->inc_state = palloc0(sizeof(JsonIncrementalState));
 	initStringInfo(&(lex->inc_state->partial_token));
-	lex->pstack = palloc(sizeof(JsonParserStack));
+	lex->pstack = palloc0(sizeof(JsonParserStack));
 	lex->pstack->stack_size = JS_STACK_CHUNK_SIZE;
 	lex->pstack->prediction = palloc(JS_STACK_CHUNK_SIZE * JS_MAX_PROD_LEN);
-	lex->pstack->pred_index = 0;
 	lex->pstack->fnames = palloc(JS_STACK_CHUNK_SIZE * sizeof(char *));
 	lex->pstack->fnull = palloc(JS_STACK_CHUNK_SIZE * sizeof(bool));
 	if (need_escapes)
@@ -495,6 +494,7 @@ freeJsonLexContext(JsonLexContext *lex)
 		pfree(lex->pstack->prediction);
 		pfree(lex->pstack->fnames);
 		pfree(lex->pstack->fnull);
+		pfree(lex->pstack->scalar_val);
 		pfree(lex->pstack);
 	}
 
@@ -912,6 +912,13 @@ pg_parse_json_incremental(JsonLexContext *lex,
 						if (sfunc != NULL)
 						{
 							result = (*sfunc) (sem->semstate, pstack->scalar_val, pstack->scalar_tok);
+
+							/*
+							 * Ownership of the token allocation has passed to
+							 * the callback; don't free it later.
+							 */
+							pstack->scalar_val = NULL;
+
 							if (result != JSON_SUCCESS)
 								return result;
 						}
@@ -1040,9 +1047,12 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 	/* consume the token */
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		pfree(val);
 		return result;
+	}
 
-	/* invoke the callback */
+	/* invoke the callback, which takes ownership of val */
 	result = (*sfunc) (sem->semstate, val, tok);
 
 	return result;
-- 
2.34.1

#2Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Jacob Champion (#1)
1 attachment(s)
Re: pg_parse_json() should not leak token copies on failure

On Tue, Apr 30, 2024 at 2:29 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:

Attached is a draft patch to illustrate what I mean, but it's
incomplete: it only solves the problem for scalar values.

(Attached is a v2 of that patch; in solving a frontend leak I should
probably not introduce a backend segfault.)

--Jacob

Attachments:

v2-0001-WIP-fix-leak-of-scalar-value-on-lex-failure.patchapplication/octet-stream; name=v2-0001-WIP-fix-leak-of-scalar-value-on-lex-failure.patchDownload
From f1883b933e106fc4dd0d3fac5b3d6a2cf3d9167d Mon Sep 17 00:00:00 2001
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Thu, 25 Apr 2024 15:26:40 -0700
Subject: [PATCH v2] WIP: fix leak of scalar value on lex failure

TODO: field names
---
 src/common/jsonapi.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index a8764302e4..b108b80ad4 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -380,10 +380,9 @@ makeJsonLexContextIncremental(JsonLexContext *lex, int encoding,
 	lex->incremental = true;
 	lex->inc_state = palloc0(sizeof(JsonIncrementalState));
 	initStringInfo(&(lex->inc_state->partial_token));
-	lex->pstack = palloc(sizeof(JsonParserStack));
+	lex->pstack = palloc0(sizeof(JsonParserStack));
 	lex->pstack->stack_size = JS_STACK_CHUNK_SIZE;
 	lex->pstack->prediction = palloc(JS_STACK_CHUNK_SIZE * JS_MAX_PROD_LEN);
-	lex->pstack->pred_index = 0;
 	lex->pstack->fnames = palloc(JS_STACK_CHUNK_SIZE * sizeof(char *));
 	lex->pstack->fnull = palloc(JS_STACK_CHUNK_SIZE * sizeof(bool));
 	if (need_escapes)
@@ -495,6 +494,8 @@ freeJsonLexContext(JsonLexContext *lex)
 		pfree(lex->pstack->prediction);
 		pfree(lex->pstack->fnames);
 		pfree(lex->pstack->fnull);
+		if (lex->pstack->scalar_val)
+			pfree(lex->pstack->scalar_val);
 		pfree(lex->pstack);
 	}
 
@@ -912,6 +913,13 @@ pg_parse_json_incremental(JsonLexContext *lex,
 						if (sfunc != NULL)
 						{
 							result = (*sfunc) (sem->semstate, pstack->scalar_val, pstack->scalar_tok);
+
+							/*
+							 * Ownership of the token allocation has passed to
+							 * the callback; don't free it later.
+							 */
+							pstack->scalar_val = NULL;
+
 							if (result != JSON_SUCCESS)
 								return result;
 						}
@@ -1040,9 +1048,13 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 	/* consume the token */
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		if (val)
+			pfree(val);
 		return result;
+	}
 
-	/* invoke the callback */
+	/* invoke the callback, which takes ownership of val */
 	result = (*sfunc) (sem->semstate, val, tok);
 
 	return result;
-- 
2.34.1

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Jacob Champion (#2)
Re: pg_parse_json() should not leak token copies on failure

Hi,

At Fri, 24 May 2024 08:43:01 -0700, Jacob Champion <jacob.champion@enterprisedb.com> wrote in

On Tue, Apr 30, 2024 at 2:29 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:

Attached is a draft patch to illustrate what I mean, but it's
incomplete: it only solves the problem for scalar values.

(Attached is a v2 of that patch; in solving a frontend leak I should
probably not introduce a backend segfault.)

I understand that the part enclosed in parentheses refers to the "if
(ptr) pfree(ptr)" structure. I believe we rely on the behavior of
free(NULL), which safely does nothing. (I couldn't find the related
discussion due to a timeout error on the ML search page.)

Although I don't fully understand the entire parser code, it seems
that the owner transfer only occurs in JSON_TOKEN_STRING cases. That
is, the memory pointed by scalar_val would become dangling in
JSON_TOKEN_NUBMER cases. Even if this is not the case, the ownership
transition apperas quite callenging to follow.

It might be safer or clearer to pstrdup the token in jsonb_in_scalar()
and avoid NULLifying scalar_val after calling callbacks, or to let
jsonb_in_sclar() NULLify the pointer.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Kyotaro Horiguchi (#3)
Re: pg_parse_json() should not leak token copies on failure

On Mon, Jun 3, 2024 at 9:32 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Hi,

Thanks for the review!

I understand that the part enclosed in parentheses refers to the "if
(ptr) pfree(ptr)" structure. I believe we rely on the behavior of
free(NULL), which safely does nothing. (I couldn't find the related
discussion due to a timeout error on the ML search page.)

For the frontend, right. For the backend, pfree(NULL) causes a crash.
I think [1]/messages/by-id/cf26e970-8e92-59f1-247a-aa265235075b@enterprisedb.com is a related discussion on the list, maybe the one you
were looking for?

Although I don't fully understand the entire parser code, it seems
that the owner transfer only occurs in JSON_TOKEN_STRING cases. That
is, the memory pointed by scalar_val would become dangling in
JSON_TOKEN_NUBMER cases.

It should happen for both cases, I think. I see that the
JSON_SEM_SCALAR_CALL case passes scalar_val into the callback
regardless of whether we're parsing a string or a number, and
parse_scalar() does the same thing over in the recursive
implementation. Have I missed a code path?

Even if this is not the case, the ownership
transition apperas quite callenging to follow.

I agree!

It might be safer or clearer to pstrdup the token in jsonb_in_scalar()
and avoid NULLifying scalar_val after calling callbacks, or to let
jsonb_in_sclar() NULLify the pointer.

Maybe. jsonb_in_scalar isn't the only scalar callback implementation,
though. It might be a fairly cruel change for dependent extensions,
since there will be no compiler complaint.

If a compatibility break is acceptable, maybe we should make the API
zero-copy for the recursive case, and just pass the length of the
parsed token? Either way, we'd have to change the object field
callbacks, too, but maybe an API change makes even more sense there,
given how convoluted it is right now.

Thanks,
--Jacob

[1]: /messages/by-id/cf26e970-8e92-59f1-247a-aa265235075b@enterprisedb.com

#5Michael Paquier
michael@paquier.xyz
In reply to: Jacob Champion (#4)
Re: pg_parse_json() should not leak token copies on failure

On Tue, Jun 04, 2024 at 10:10:06AM -0700, Jacob Champion wrote:

On Mon, Jun 3, 2024 at 9:32 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I understand that the part enclosed in parentheses refers to the "if
(ptr) pfree(ptr)" structure. I believe we rely on the behavior of
free(NULL), which safely does nothing. (I couldn't find the related
discussion due to a timeout error on the ML search page.)

For the frontend, right. For the backend, pfree(NULL) causes a crash.
I think [1] is a related discussion on the list, maybe the one you
were looking for?

FYI, these choices relate also to 805a397db40b, e890ce7a4feb and
098c703d308f.
--
Michael

#6Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Michael Paquier (#5)
1 attachment(s)
Re: pg_parse_json() should not leak token copies on failure

(Tom, thank you for the fixup in 75240f65!)

Off-list, Andrew suggested an alternative approach to the one I'd been
taking: let the client choose whether it wants to take token ownership
to begin with. v3 adds a new flag (and associated API) which will
allow libpq to refuse ownership of those tokens. The lexer is then
free to clean everything up during parse failures.

Usually I'm not a fan of "do the right thing" flags, but in this case,
libpq really is the outlier. And it's nice that existing clients
(potentially including extensions) don't have to worry about an API
change.

At the moment, we have a test matrix consisting of "standard frontend"
and "shlib frontend" tests for the incremental parser. I'm planning
for the v4 patch to extend that with a "owned/not owned" dimension;
any objections?

Thanks,
--Jacob

Attachments:

v3-0001-jsonapi-add-lexer-option-to-keep-token-ownership.patchapplication/octet-stream; name=v3-0001-jsonapi-add-lexer-option-to-keep-token-ownership.patchDownload
From 81b1d7cdb64e91a59670609510c08a55ced04095 Mon Sep 17 00:00:00 2001
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Thu, 25 Apr 2024 15:26:40 -0700
Subject: [PATCH v3] jsonapi: add lexer option to keep token ownership

Commit 0785d1b8b adds support for libpq as a JSON client, but
allocations for string tokens can still be leaked during parsing
failures. This is tricky to fix for the object_field semantic callbacks:
the field name must remain valid until the end of the object, but if a
parsing error is encountered partway through, object_field_end() won't
be invoked and the client won't get a chance to free the field name.

At Andrew's suggestion, add a flag to switch the ownership of parsed
tokens to the lexer. When this is enabled, the client must make a copy
of any tokens it wants to persist past the callback lifetime, but the
lexer will handle necessary cleanup on failure.
---
 src/common/jsonapi.c         | 90 ++++++++++++++++++++++++++++++++----
 src/include/common/jsonapi.h | 28 +++++++++--
 2 files changed, 107 insertions(+), 11 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 45838d8a18..79f5ffa238 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -280,6 +280,7 @@ static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSem
 static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem);
 static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
 static bool allocate_incremental_state(JsonLexContext *lex);
+static inline void set_fname(JsonLexContext *lex, char *fname);
 
 /* the null action object used for pure validation */
 const JsonSemAction nullSemAction =
@@ -437,7 +438,7 @@ allocate_incremental_state(JsonLexContext *lex)
 			   *fnull;
 
 	lex->inc_state = ALLOC0(sizeof(JsonIncrementalState));
-	pstack = ALLOC(sizeof(JsonParserStack));
+	pstack = ALLOC0(sizeof(JsonParserStack));
 	prediction = ALLOC(JS_STACK_CHUNK_SIZE * JS_MAX_PROD_LEN);
 	fnames = ALLOC(JS_STACK_CHUNK_SIZE * sizeof(char *));
 	fnull = ALLOC(JS_STACK_CHUNK_SIZE * sizeof(bool));
@@ -464,10 +465,17 @@ allocate_incremental_state(JsonLexContext *lex)
 	lex->pstack = pstack;
 	lex->pstack->stack_size = JS_STACK_CHUNK_SIZE;
 	lex->pstack->prediction = prediction;
-	lex->pstack->pred_index = 0;
 	lex->pstack->fnames = fnames;
 	lex->pstack->fnull = fnull;
 
+	/*
+	 * fnames between 0 and lex_level must always be defined so that
+	 * freeJsonLexContext() can handle them safely. inc/dec_lex_level() handle
+	 * the rest.
+	 */
+	Assert(lex->lex_level == 0);
+	lex->pstack->fnames[0] = NULL;
+
 	lex->incremental = true;
 	return true;
 }
@@ -530,6 +538,15 @@ makeJsonLexContextIncremental(JsonLexContext *lex, int encoding,
 	return lex;
 }
 
+void
+setJsonLexContextOwnsTokens(JsonLexContext *lex, bool owned_by_context)
+{
+	if (owned_by_context)
+		lex->flags |= JSONLEX_CTX_OWNS_TOKENS;
+	else
+		lex->flags &= ~JSONLEX_CTX_OWNS_TOKENS;
+}
+
 static inline bool
 inc_lex_level(JsonLexContext *lex)
 {
@@ -569,12 +586,23 @@ inc_lex_level(JsonLexContext *lex)
 	}
 
 	lex->lex_level += 1;
+
+	if (lex->incremental)
+	{
+		/*
+		 * Ensure freeJsonLexContext() remains safe even if no fname is assigned
+		 * at this level.
+		 */
+		lex->pstack->fnames[lex->lex_level] = NULL;
+	}
+
 	return true;
 }
 
 static inline void
 dec_lex_level(JsonLexContext *lex)
 {
+	set_fname(lex, NULL); /* free the current level's fname, if needed */
 	lex->lex_level -= 1;
 }
 
@@ -608,6 +636,15 @@ have_prediction(JsonParserStack *pstack)
 static inline void
 set_fname(JsonLexContext *lex, char *fname)
 {
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+	{
+		/*
+		 * Don't leak prior fnames. If one hasn't been assigned yet,
+		 * inc_lex_level ensured that it's NULL (and therefore safe to free).
+		 */
+		FREE(lex->pstack->fnames[lex->lex_level]);
+	}
+
 	lex->pstack->fnames[lex->lex_level] = fname;
 }
 
@@ -655,8 +692,19 @@ freeJsonLexContext(JsonLexContext *lex)
 		jsonapi_termStringInfo(&lex->inc_state->partial_token);
 		FREE(lex->inc_state);
 		FREE(lex->pstack->prediction);
+
+		if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		{
+			int			i;
+
+			/* Clean up any tokens that were left behind. */
+			for (i = 0; i <= lex->lex_level; i++)
+				FREE(lex->pstack->fnames[i]);
+		}
+
 		FREE(lex->pstack->fnames);
 		FREE(lex->pstack->fnull);
+		FREE(lex->pstack->scalar_val);
 		FREE(lex->pstack);
 	}
 
@@ -1086,6 +1134,17 @@ pg_parse_json_incremental(JsonLexContext *lex,
 						if (sfunc != NULL)
 						{
 							result = (*sfunc) (sem->semstate, pstack->scalar_val, pstack->scalar_tok);
+
+							/*
+							 * Either ownership of the token passed to the
+							 * callback, or we need to free it now. Either way,
+							 * clear our pointer to it so it doesn't get freed
+							 * in the future.
+							 */
+							if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+								FREE(pstack->scalar_val);
+							pstack->scalar_val = NULL;
+
 							if (result != JSON_SUCCESS)
 								return result;
 						}
@@ -1221,11 +1280,17 @@ parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 	/* consume the token */
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(val);
 		return result;
+	}
 
-	/* invoke the callback */
+	/* invoke the callback, which may take ownership of val */
 	result = (*sfunc) (sem->semstate, val, tok);
 
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		FREE(val);
+
 	return result;
 }
 
@@ -1238,7 +1303,7 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	 * generally call a field name a "key".
 	 */
 
-	char	   *fname = NULL;	/* keep compiler quiet */
+	char	   *fname = NULL;
 	json_ofield_action ostart = sem->object_field_start;
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
@@ -1255,11 +1320,17 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	}
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(fname);
 		return result;
+	}
 
 	result = lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(fname);
 		return result;
+	}
 
 	tok = lex_peek(lex);
 	isnull = tok == JSON_TOKEN_NULL;
@@ -1268,7 +1339,7 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	{
 		result = (*ostart) (sem->semstate, fname, isnull);
 		if (result != JSON_SUCCESS)
-			return result;
+			goto ofield_cleanup;
 	}
 
 	switch (tok)
@@ -1283,16 +1354,19 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 			result = parse_scalar(lex, sem);
 	}
 	if (result != JSON_SUCCESS)
-		return result;
+		goto ofield_cleanup;
 
 	if (oend != NULL)
 	{
 		result = (*oend) (sem->semstate, fname, isnull);
 		if (result != JSON_SUCCESS)
-			return result;
+			goto ofield_cleanup;
 	}
 
-	return JSON_SUCCESS;
+ofield_cleanup:
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		FREE(fname);
+	return result;
 }
 
 static JsonParseErrorType
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index c524ff5be8..167615a557 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -92,9 +92,11 @@ typedef struct JsonIncrementalState JsonIncrementalState;
  * conjunction with token_start.
  *
  * JSONLEX_FREE_STRUCT/STRVAL are used to drive freeJsonLexContext.
+ * JSONLEX_CTX_OWNS_TOKENS is used by setJsonLexContextOwnsTokens.
  */
 #define JSONLEX_FREE_STRUCT			(1 << 0)
 #define JSONLEX_FREE_STRVAL			(1 << 1)
+#define JSONLEX_CTX_OWNS_TOKENS		(1 << 2)
 typedef struct JsonLexContext
 {
 	const char *input;
@@ -130,9 +132,10 @@ typedef JsonParseErrorType (*json_scalar_action) (void *state, char *token, Json
  * to doing a pure parse with no side-effects, and is therefore exactly
  * what the json input routines do.
  *
- * The 'fname' and 'token' strings passed to these actions are palloc'd.
- * They are not free'd or used further by the parser, so the action function
- * is free to do what it wishes with them.
+ * By default, the 'fname' and 'token' strings passed to these actions are
+ * palloc'd.  They are not free'd or used further by the parser, so the action
+ * function is free to do what it wishes with them. This behavior may be
+ * modified by setJsonLexContextOwnsTokens().
  *
  * All action functions return JsonParseErrorType.  If the result isn't
  * JSON_SUCCESS, the parse is abandoned and that error code is returned.
@@ -216,6 +219,25 @@ extern JsonLexContext *makeJsonLexContextIncremental(JsonLexContext *lex,
 													 int encoding,
 													 bool need_escapes);
 
+/*
+ * Sets whether tokens passed to semantic action callbacks are owned by the
+ * context (in which case, the callback must duplicate the tokens for long-term
+ * storage) or by the callback (in which case, the callback must explicitly
+ * free tokens to avoid leaks).
+ *
+ * By default, this setting is false: the callback owns the tokens that are
+ * passed to it (and if parsing fails between the two object-field callbacks,
+ * the field name token will likely leak). If set to true, tokens will be freed
+ * by the lexer after the callback completes.
+ *
+ * Setting this to true is important for long-lived clients (such as libpq)
+ * that must not leak memory during a parse failure. For a server backend using
+ * memory contexts, or a client application which will exit on parse failure,
+ * this setting is less critical.
+ */
+extern void setJsonLexContextOwnsTokens(JsonLexContext *lex,
+										bool owned_by_context);
+
 extern void freeJsonLexContext(JsonLexContext *lex);
 
 /* lex one token */
-- 
2.34.1

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Jacob Champion (#6)
Re: pg_parse_json() should not leak token copies on failure

On 2024-10-01 Tu 3:04 PM, Jacob Champion wrote:

(Tom, thank you for the fixup in 75240f65!)

Off-list, Andrew suggested an alternative approach to the one I'd been
taking: let the client choose whether it wants to take token ownership
to begin with. v3 adds a new flag (and associated API) which will
allow libpq to refuse ownership of those tokens. The lexer is then
free to clean everything up during parse failures.

Usually I'm not a fan of "do the right thing" flags, but in this case,
libpq really is the outlier. And it's nice that existing clients
(potentially including extensions) don't have to worry about an API
change.

Yeah.

Generally looks good. Should we have a check in
setJsonLexContextOwnsTokens() that we haven't started parsing yet, for
the incremental case?

At the moment, we have a test matrix consisting of "standard frontend"
and "shlib frontend" tests for the incremental parser. I'm planning
for the v4 patch to extend that with a "owned/not owned" dimension;
any objections?

Sounds reasonable.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#8Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Andrew Dunstan (#7)
3 attachment(s)
Re: pg_parse_json() should not leak token copies on failure

On Wed, Oct 2, 2024 at 10:45 AM Andrew Dunstan <andrew@dunslane.net> wrote:

Generally looks good. Should we have a check in
setJsonLexContextOwnsTokens() that we haven't started parsing yet, for
the incremental case?

Good catch. Added in v4.

At the moment, we have a test matrix consisting of "standard frontend"
and "shlib frontend" tests for the incremental parser. I'm planning
for the v4 patch to extend that with a "owned/not owned" dimension;
any objections?

Sounds reasonable.

This is also done, along with runs of pgindent/perltidy.

I've added a 0002 as well. While running tests under Valgrind, it
complained about the uninitialized dummy_lex on the stack, which is
now fixed. (That bug was introduced with my patch in 0785d1b8b and is
technically orthogonal to 0001, but I figured I could track it here
for now.) This is also how I found out that my existing fuzzers
weren't checking for uninitialized memory, like I thought they were.
Grumble...

Thanks,
--Jacob

Attachments:

since-v3.diff.txttext/plain; charset=US-ASCII; name=since-v3.diff.txtDownload
1:  81b1d7cdb6 ! 1:  6a0d088cbe jsonapi: add lexer option to keep token ownership
    @@ Commit message
         of any tokens it wants to persist past the callback lifetime, but the
         lexer will handle necessary cleanup on failure.
     
    +    A -o option has been added to test_json_parser_incremental to exercise
    +    the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
    +    tests make use of it. (The test program now cleans up allocated memory,
    +    so that tests can be usefully run under leak sanitizers.)
    +
      ## src/common/jsonapi.c ##
    +@@ src/common/jsonapi.c: struct JsonParserStack
    +  */
    + struct JsonIncrementalState
    + {
    ++	bool		started;
    + 	bool		is_last_chunk;
    + 	bool		partial_completed;
    + 	jsonapi_StrValType partial_token;
     @@ src/common/jsonapi.c: static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSem
      static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem);
      static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
    @@ src/common/jsonapi.c: makeJsonLexContextIncremental(JsonLexContext *lex, int enc
     +void
     +setJsonLexContextOwnsTokens(JsonLexContext *lex, bool owned_by_context)
     +{
    ++	if (lex->incremental && lex->inc_state->started)
    ++	{
    ++		/*
    ++		 * Switching this flag after parsing has already started is a
    ++		 * programming error.
    ++		 */
    ++		Assert(false);
    ++		return;
    ++	}
    ++
     +	if (owned_by_context)
     +		lex->flags |= JSONLEX_CTX_OWNS_TOKENS;
     +	else
    @@ src/common/jsonapi.c: inc_lex_level(JsonLexContext *lex)
     +	if (lex->incremental)
     +	{
     +		/*
    -+		 * Ensure freeJsonLexContext() remains safe even if no fname is assigned
    -+		 * at this level.
    ++		 * Ensure freeJsonLexContext() remains safe even if no fname is
    ++		 * assigned at this level.
     +		 */
     +		lex->pstack->fnames[lex->lex_level] = NULL;
     +	}
    @@ src/common/jsonapi.c: inc_lex_level(JsonLexContext *lex)
      static inline void
      dec_lex_level(JsonLexContext *lex)
      {
    -+	set_fname(lex, NULL); /* free the current level's fname, if needed */
    ++	set_fname(lex, NULL);		/* free the current level's fname, if needed */
      	lex->lex_level -= 1;
      }
      
    @@ src/common/jsonapi.c: freeJsonLexContext(JsonLexContext *lex)
      		FREE(lex->pstack);
      	}
      
    +@@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex,
    + 	lex->input = lex->token_terminator = lex->line_start = json;
    + 	lex->input_length = len;
    + 	lex->inc_state->is_last_chunk = is_last;
    ++	lex->inc_state->started = true;
    + 
    + 	/* get the initial token */
    + 	result = json_lex(lex);
     @@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex,
      						if (sfunc != NULL)
      						{
    @@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex,
     +
     +							/*
     +							 * Either ownership of the token passed to the
    -+							 * callback, or we need to free it now. Either way,
    -+							 * clear our pointer to it so it doesn't get freed
    -+							 * in the future.
    ++							 * callback, or we need to free it now. Either
    ++							 * way, clear our pointer to it so it doesn't get
    ++							 * freed in the future.
     +							 */
     +							if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
     +								FREE(pstack->scalar_val);
    @@ src/include/common/jsonapi.h: extern JsonLexContext *makeJsonLexContextIncrement
      extern void freeJsonLexContext(JsonLexContext *lex);
      
      /* lex one token */
    +
    + ## src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl ##
    +@@ src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl: use FindBin;
    + 
    + my $test_file = "$FindBin::RealBin/../tiny.json";
    + 
    +-my @exes =
    +-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
    ++my @exes = (
    ++	[ "test_json_parser_incremental", ],
    ++	[ "test_json_parser_incremental", "-o", ],
    ++	[ "test_json_parser_incremental_shlib", ],
    ++	[ "test_json_parser_incremental_shlib", "-o", ]);
    + 
    + foreach my $exe (@exes)
    + {
    +-	note "testing executable $exe";
    ++	note "testing executable @$exe";
    + 
    + 	# Test the  usage error
    +-	my ($stdout, $stderr) = run_command([ $exe, "-c", 10 ]);
    ++	my ($stdout, $stderr) = run_command([ @$exe, "-c", 10 ]);
    + 	like($stderr, qr/Usage:/, 'error message if not enough arguments');
    + 
    + 	# Test that we get success for small chunk sizes from 64 down to 1.
    + 	for (my $size = 64; $size > 0; $size--)
    + 	{
    +-		($stdout, $stderr) = run_command([ $exe, "-c", $size, $test_file ]);
    ++		($stdout, $stderr) = run_command([ @$exe, "-c", $size, $test_file ]);
    + 
    + 		like($stdout, qr/SUCCESS/, "chunk size $size: test succeeds");
    + 		is($stderr, "", "chunk size $size: no error output");
    +
    + ## src/test/modules/test_json_parser/t/002_inline.pl ##
    +@@ src/test/modules/test_json_parser/t/002_inline.pl: use Test::More;
    + use File::Temp qw(tempfile);
    + 
    + my $dir = PostgreSQL::Test::Utils::tempdir;
    +-my $exe;
    ++my @exe;
    + 
    + sub test
    + {
    +@@ src/test/modules/test_json_parser/t/002_inline.pl: sub test
    + 
    + 	foreach my $size (reverse(1 .. $chunk))
    + 	{
    +-		my ($stdout, $stderr) = run_command([ $exe, "-c", $size, $fname ]);
    ++		my ($stdout, $stderr) = run_command([ @exe, "-c", $size, $fname ]);
    + 
    + 		if (defined($params{error}))
    + 		{
    +@@ src/test/modules/test_json_parser/t/002_inline.pl: sub test
    + 	}
    + }
    + 
    +-my @exes =
    +-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
    ++my @exes = (
    ++	[ "test_json_parser_incremental", ],
    ++	[ "test_json_parser_incremental", "-o", ],
    ++	[ "test_json_parser_incremental_shlib", ],
    ++	[ "test_json_parser_incremental_shlib", "-o", ]);
    + 
    + foreach (@exes)
    + {
    +-	$exe = $_;
    +-	note "testing executable $exe";
    ++	@exe = @$_;
    ++	note "testing executable @exe";
    + 
    + 	test("number", "12345");
    + 	test("string", '"hello"');
    +
    + ## src/test/modules/test_json_parser/t/003_test_semantic.pl ##
    +@@ src/test/modules/test_json_parser/t/003_test_semantic.pl: use File::Temp qw(tempfile);
    + my $test_file = "$FindBin::RealBin/../tiny.json";
    + my $test_out = "$FindBin::RealBin/../tiny.out";
    + 
    +-my @exes =
    +-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
    ++my @exes = (
    ++	[ "test_json_parser_incremental", ],
    ++	[ "test_json_parser_incremental", "-o", ],
    ++	[ "test_json_parser_incremental_shlib", ],
    ++	[ "test_json_parser_incremental_shlib", "-o", ]);
    + 
    + foreach my $exe (@exes)
    + {
    +-	note "testing executable $exe";
    ++	note "testing executable @$exe";
    + 
    +-	my ($stdout, $stderr) = run_command([ $exe, "-s", $test_file ]);
    ++	my ($stdout, $stderr) = run_command([ @$exe, "-s", $test_file ]);
    + 
    + 	is($stderr, "", "no error output");
    + 
    +
    + ## src/test/modules/test_json_parser/test_json_parser_incremental.c ##
    +@@
    +  * If the -s flag is given, the program does semantic processing. This should
    +  * just mirror back the json, albeit with white space changes.
    +  *
    ++ * If the -o flag is given, the JSONLEX_CTX_OWNS_TOKENS flag is set. (This can
    ++ * be used in combination with a leak sanitizer; without the option, the parser
    ++ * may leak memory with invalid JSON.)
    ++ *
    +  * The argument specifies the file containing the JSON input.
    +  *
    +  *-------------------------------------------------------------------------
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: static JsonSemAction sem = {
    + 	.scalar = do_scalar
    + };
    + 
    ++static bool lex_owns_tokens = false;
    ++
    + int
    + main(int argc, char **argv)
    + {
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: main(int argc, char **argv)
    + 	char	   *testfile;
    + 	int			c;
    + 	bool		need_strings = false;
    ++	int			ret = 0;
    + 
    + 	pg_logging_init(argv[0]);
    + 
    +-	while ((c = getopt(argc, argv, "c:s")) != -1)
    ++	while ((c = getopt(argc, argv, "c:os")) != -1)
    + 	{
    + 		switch (c)
    + 		{
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: main(int argc, char **argv)
    + 				if (chunk_size > BUFSIZE)
    + 					pg_fatal("chunk size cannot exceed %d", BUFSIZE);
    + 				break;
    ++			case 'o':			/* switch token ownership */
    ++				lex_owns_tokens = true;
    ++				break;
    + 			case 's':			/* do semantic processing */
    + 				testsem = &sem;
    + 				sem.semstate = palloc(sizeof(struct DoState));
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: main(int argc, char **argv)
    + 
    + 	if (optind < argc)
    + 	{
    +-		testfile = pg_strdup(argv[optind]);
    ++		testfile = argv[optind];
    + 		optind++;
    + 	}
    + 	else
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: main(int argc, char **argv)
    + 	}
    + 
    + 	makeJsonLexContextIncremental(&lex, PG_UTF8, need_strings);
    ++	setJsonLexContextOwnsTokens(&lex, lex_owns_tokens);
    + 	initStringInfo(&json);
    + 
    + 	if ((json_file = fopen(testfile, PG_BINARY_R)) == NULL)
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: main(int argc, char **argv)
    + 			if (result != JSON_INCOMPLETE)
    + 			{
    + 				fprintf(stderr, "%s\n", json_errdetail(result, &lex));
    +-				exit(1);
    ++				ret = 1;
    ++				goto cleanup;
    + 			}
    + 			resetStringInfo(&json);
    + 		}
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: main(int argc, char **argv)
    + 			if (result != JSON_SUCCESS)
    + 			{
    + 				fprintf(stderr, "%s\n", json_errdetail(result, &lex));
    +-				exit(1);
    ++				ret = 1;
    ++				goto cleanup;
    + 			}
    + 			if (!need_strings)
    + 				printf("SUCCESS!\n");
    + 			break;
    + 		}
    + 	}
    ++
    ++cleanup:
    + 	fclose(json_file);
    +-	exit(0);
    ++	freeJsonLexContext(&lex);
    ++	free(json.data);
    ++
    ++	return ret;
    + }
    + 
    + /*
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: do_object_field_start(void *state, char *fname, bool isnull)
    + static JsonParseErrorType
    + do_object_field_end(void *state, char *fname, bool isnull)
    + {
    +-	/* nothing to do really */
    ++	if (!lex_owns_tokens)
    ++		free(fname);
    + 
    + 	return JSON_SUCCESS;
    + }
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: do_scalar(void *state, char *token, JsonTokenType tokentype)
    + 	else
    + 		printf("%s", token);
    + 
    ++	if (!lex_owns_tokens)
    ++		free(token);
    ++
    + 	return JSON_SUCCESS;
    + }
    + 
    +@@ src/test/modules/test_json_parser/test_json_parser_incremental.c: usage(const char *progname)
    + {
    + 	fprintf(stderr, "Usage: %s [OPTION ...] testfile\n", progname);
    + 	fprintf(stderr, "Options:\n");
    +-	fprintf(stderr, "  -c chunksize      size of piece fed to parser (default 64)n");
    ++	fprintf(stderr, "  -c chunksize      size of piece fed to parser (default 64)\n");
    ++	fprintf(stderr, "  -o                set JSONLEX_CTX_OWNS_TOKENS for leak checking\n");
    + 	fprintf(stderr, "  -s                do semantic processing\n");
    + 
    + }
-:  ---------- > 2:  d3e639ba2b jsonapi: fully initialize dummy lexer
v4-0001-jsonapi-add-lexer-option-to-keep-token-ownership.patchapplication/octet-stream; name=v4-0001-jsonapi-add-lexer-option-to-keep-token-ownership.patchDownload
From 6a0d088cbe7d4ece3762cb867e3d6588d18ad8f6 Mon Sep 17 00:00:00 2001
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Thu, 25 Apr 2024 15:26:40 -0700
Subject: [PATCH v4 1/2] jsonapi: add lexer option to keep token ownership

Commit 0785d1b8b adds support for libpq as a JSON client, but
allocations for string tokens can still be leaked during parsing
failures. This is tricky to fix for the object_field semantic callbacks:
the field name must remain valid until the end of the object, but if a
parsing error is encountered partway through, object_field_end() won't
be invoked and the client won't get a chance to free the field name.

At Andrew's suggestion, add a flag to switch the ownership of parsed
tokens to the lexer. When this is enabled, the client must make a copy
of any tokens it wants to persist past the callback lifetime, but the
lexer will handle necessary cleanup on failure.

A -o option has been added to test_json_parser_incremental to exercise
the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
tests make use of it. (The test program now cleans up allocated memory,
so that tests can be usefully run under leak sanitizers.)
---
 src/common/jsonapi.c                          | 102 ++++++++++++++++--
 src/include/common/jsonapi.h                  |  28 ++++-
 .../t/001_test_json_parser_incremental.pl     |  13 ++-
 .../modules/test_json_parser/t/002_inline.pl  |  15 +--
 .../test_json_parser/t/003_test_semantic.pl   |  11 +-
 .../test_json_parser_incremental.c            |  37 +++++--
 6 files changed, 173 insertions(+), 33 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 45838d8a18..df6e633b5e 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -161,6 +161,7 @@ struct JsonParserStack
  */
 struct JsonIncrementalState
 {
+	bool		started;
 	bool		is_last_chunk;
 	bool		partial_completed;
 	jsonapi_StrValType partial_token;
@@ -280,6 +281,7 @@ static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSem
 static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem);
 static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
 static bool allocate_incremental_state(JsonLexContext *lex);
+static inline void set_fname(JsonLexContext *lex, char *fname);
 
 /* the null action object used for pure validation */
 const JsonSemAction nullSemAction =
@@ -437,7 +439,7 @@ allocate_incremental_state(JsonLexContext *lex)
 			   *fnull;
 
 	lex->inc_state = ALLOC0(sizeof(JsonIncrementalState));
-	pstack = ALLOC(sizeof(JsonParserStack));
+	pstack = ALLOC0(sizeof(JsonParserStack));
 	prediction = ALLOC(JS_STACK_CHUNK_SIZE * JS_MAX_PROD_LEN);
 	fnames = ALLOC(JS_STACK_CHUNK_SIZE * sizeof(char *));
 	fnull = ALLOC(JS_STACK_CHUNK_SIZE * sizeof(bool));
@@ -464,10 +466,17 @@ allocate_incremental_state(JsonLexContext *lex)
 	lex->pstack = pstack;
 	lex->pstack->stack_size = JS_STACK_CHUNK_SIZE;
 	lex->pstack->prediction = prediction;
-	lex->pstack->pred_index = 0;
 	lex->pstack->fnames = fnames;
 	lex->pstack->fnull = fnull;
 
+	/*
+	 * fnames between 0 and lex_level must always be defined so that
+	 * freeJsonLexContext() can handle them safely. inc/dec_lex_level() handle
+	 * the rest.
+	 */
+	Assert(lex->lex_level == 0);
+	lex->pstack->fnames[0] = NULL;
+
 	lex->incremental = true;
 	return true;
 }
@@ -530,6 +539,25 @@ makeJsonLexContextIncremental(JsonLexContext *lex, int encoding,
 	return lex;
 }
 
+void
+setJsonLexContextOwnsTokens(JsonLexContext *lex, bool owned_by_context)
+{
+	if (lex->incremental && lex->inc_state->started)
+	{
+		/*
+		 * Switching this flag after parsing has already started is a
+		 * programming error.
+		 */
+		Assert(false);
+		return;
+	}
+
+	if (owned_by_context)
+		lex->flags |= JSONLEX_CTX_OWNS_TOKENS;
+	else
+		lex->flags &= ~JSONLEX_CTX_OWNS_TOKENS;
+}
+
 static inline bool
 inc_lex_level(JsonLexContext *lex)
 {
@@ -569,12 +597,23 @@ inc_lex_level(JsonLexContext *lex)
 	}
 
 	lex->lex_level += 1;
+
+	if (lex->incremental)
+	{
+		/*
+		 * Ensure freeJsonLexContext() remains safe even if no fname is
+		 * assigned at this level.
+		 */
+		lex->pstack->fnames[lex->lex_level] = NULL;
+	}
+
 	return true;
 }
 
 static inline void
 dec_lex_level(JsonLexContext *lex)
 {
+	set_fname(lex, NULL);		/* free the current level's fname, if needed */
 	lex->lex_level -= 1;
 }
 
@@ -608,6 +647,15 @@ have_prediction(JsonParserStack *pstack)
 static inline void
 set_fname(JsonLexContext *lex, char *fname)
 {
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+	{
+		/*
+		 * Don't leak prior fnames. If one hasn't been assigned yet,
+		 * inc_lex_level ensured that it's NULL (and therefore safe to free).
+		 */
+		FREE(lex->pstack->fnames[lex->lex_level]);
+	}
+
 	lex->pstack->fnames[lex->lex_level] = fname;
 }
 
@@ -655,8 +703,19 @@ freeJsonLexContext(JsonLexContext *lex)
 		jsonapi_termStringInfo(&lex->inc_state->partial_token);
 		FREE(lex->inc_state);
 		FREE(lex->pstack->prediction);
+
+		if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		{
+			int			i;
+
+			/* Clean up any tokens that were left behind. */
+			for (i = 0; i <= lex->lex_level; i++)
+				FREE(lex->pstack->fnames[i]);
+		}
+
 		FREE(lex->pstack->fnames);
 		FREE(lex->pstack->fnull);
+		FREE(lex->pstack->scalar_val);
 		FREE(lex->pstack);
 	}
 
@@ -826,6 +885,7 @@ pg_parse_json_incremental(JsonLexContext *lex,
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->input_length = len;
 	lex->inc_state->is_last_chunk = is_last;
+	lex->inc_state->started = true;
 
 	/* get the initial token */
 	result = json_lex(lex);
@@ -1086,6 +1146,17 @@ pg_parse_json_incremental(JsonLexContext *lex,
 						if (sfunc != NULL)
 						{
 							result = (*sfunc) (sem->semstate, pstack->scalar_val, pstack->scalar_tok);
+
+							/*
+							 * Either ownership of the token passed to the
+							 * callback, or we need to free it now. Either
+							 * way, clear our pointer to it so it doesn't get
+							 * freed in the future.
+							 */
+							if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+								FREE(pstack->scalar_val);
+							pstack->scalar_val = NULL;
+
 							if (result != JSON_SUCCESS)
 								return result;
 						}
@@ -1221,11 +1292,17 @@ parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 	/* consume the token */
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(val);
 		return result;
+	}
 
-	/* invoke the callback */
+	/* invoke the callback, which may take ownership of val */
 	result = (*sfunc) (sem->semstate, val, tok);
 
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		FREE(val);
+
 	return result;
 }
 
@@ -1238,7 +1315,7 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	 * generally call a field name a "key".
 	 */
 
-	char	   *fname = NULL;	/* keep compiler quiet */
+	char	   *fname = NULL;
 	json_ofield_action ostart = sem->object_field_start;
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
@@ -1255,11 +1332,17 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	}
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(fname);
 		return result;
+	}
 
 	result = lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(fname);
 		return result;
+	}
 
 	tok = lex_peek(lex);
 	isnull = tok == JSON_TOKEN_NULL;
@@ -1268,7 +1351,7 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	{
 		result = (*ostart) (sem->semstate, fname, isnull);
 		if (result != JSON_SUCCESS)
-			return result;
+			goto ofield_cleanup;
 	}
 
 	switch (tok)
@@ -1283,16 +1366,19 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 			result = parse_scalar(lex, sem);
 	}
 	if (result != JSON_SUCCESS)
-		return result;
+		goto ofield_cleanup;
 
 	if (oend != NULL)
 	{
 		result = (*oend) (sem->semstate, fname, isnull);
 		if (result != JSON_SUCCESS)
-			return result;
+			goto ofield_cleanup;
 	}
 
-	return JSON_SUCCESS;
+ofield_cleanup:
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		FREE(fname);
+	return result;
 }
 
 static JsonParseErrorType
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index c524ff5be8..167615a557 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -92,9 +92,11 @@ typedef struct JsonIncrementalState JsonIncrementalState;
  * conjunction with token_start.
  *
  * JSONLEX_FREE_STRUCT/STRVAL are used to drive freeJsonLexContext.
+ * JSONLEX_CTX_OWNS_TOKENS is used by setJsonLexContextOwnsTokens.
  */
 #define JSONLEX_FREE_STRUCT			(1 << 0)
 #define JSONLEX_FREE_STRVAL			(1 << 1)
+#define JSONLEX_CTX_OWNS_TOKENS		(1 << 2)
 typedef struct JsonLexContext
 {
 	const char *input;
@@ -130,9 +132,10 @@ typedef JsonParseErrorType (*json_scalar_action) (void *state, char *token, Json
  * to doing a pure parse with no side-effects, and is therefore exactly
  * what the json input routines do.
  *
- * The 'fname' and 'token' strings passed to these actions are palloc'd.
- * They are not free'd or used further by the parser, so the action function
- * is free to do what it wishes with them.
+ * By default, the 'fname' and 'token' strings passed to these actions are
+ * palloc'd.  They are not free'd or used further by the parser, so the action
+ * function is free to do what it wishes with them. This behavior may be
+ * modified by setJsonLexContextOwnsTokens().
  *
  * All action functions return JsonParseErrorType.  If the result isn't
  * JSON_SUCCESS, the parse is abandoned and that error code is returned.
@@ -216,6 +219,25 @@ extern JsonLexContext *makeJsonLexContextIncremental(JsonLexContext *lex,
 													 int encoding,
 													 bool need_escapes);
 
+/*
+ * Sets whether tokens passed to semantic action callbacks are owned by the
+ * context (in which case, the callback must duplicate the tokens for long-term
+ * storage) or by the callback (in which case, the callback must explicitly
+ * free tokens to avoid leaks).
+ *
+ * By default, this setting is false: the callback owns the tokens that are
+ * passed to it (and if parsing fails between the two object-field callbacks,
+ * the field name token will likely leak). If set to true, tokens will be freed
+ * by the lexer after the callback completes.
+ *
+ * Setting this to true is important for long-lived clients (such as libpq)
+ * that must not leak memory during a parse failure. For a server backend using
+ * memory contexts, or a client application which will exit on parse failure,
+ * this setting is less critical.
+ */
+extern void setJsonLexContextOwnsTokens(JsonLexContext *lex,
+										bool owned_by_context);
+
 extern void freeJsonLexContext(JsonLexContext *lex);
 
 /* lex one token */
diff --git a/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl b/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
index 8cc42e8e29..0c663b8e68 100644
--- a/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
+++ b/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
@@ -13,21 +13,24 @@ use FindBin;
 
 my $test_file = "$FindBin::RealBin/../tiny.json";
 
-my @exes =
-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
+my @exes = (
+	[ "test_json_parser_incremental", ],
+	[ "test_json_parser_incremental", "-o", ],
+	[ "test_json_parser_incremental_shlib", ],
+	[ "test_json_parser_incremental_shlib", "-o", ]);
 
 foreach my $exe (@exes)
 {
-	note "testing executable $exe";
+	note "testing executable @$exe";
 
 	# Test the  usage error
-	my ($stdout, $stderr) = run_command([ $exe, "-c", 10 ]);
+	my ($stdout, $stderr) = run_command([ @$exe, "-c", 10 ]);
 	like($stderr, qr/Usage:/, 'error message if not enough arguments');
 
 	# Test that we get success for small chunk sizes from 64 down to 1.
 	for (my $size = 64; $size > 0; $size--)
 	{
-		($stdout, $stderr) = run_command([ $exe, "-c", $size, $test_file ]);
+		($stdout, $stderr) = run_command([ @$exe, "-c", $size, $test_file ]);
 
 		like($stdout, qr/SUCCESS/, "chunk size $size: test succeeds");
 		is($stderr, "", "chunk size $size: no error output");
diff --git a/src/test/modules/test_json_parser/t/002_inline.pl b/src/test/modules/test_json_parser/t/002_inline.pl
index 5b6c6dc4ae..71c462b319 100644
--- a/src/test/modules/test_json_parser/t/002_inline.pl
+++ b/src/test/modules/test_json_parser/t/002_inline.pl
@@ -13,7 +13,7 @@ use Test::More;
 use File::Temp qw(tempfile);
 
 my $dir = PostgreSQL::Test::Utils::tempdir;
-my $exe;
+my @exe;
 
 sub test
 {
@@ -35,7 +35,7 @@ sub test
 
 	foreach my $size (reverse(1 .. $chunk))
 	{
-		my ($stdout, $stderr) = run_command([ $exe, "-c", $size, $fname ]);
+		my ($stdout, $stderr) = run_command([ @exe, "-c", $size, $fname ]);
 
 		if (defined($params{error}))
 		{
@@ -53,13 +53,16 @@ sub test
 	}
 }
 
-my @exes =
-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
+my @exes = (
+	[ "test_json_parser_incremental", ],
+	[ "test_json_parser_incremental", "-o", ],
+	[ "test_json_parser_incremental_shlib", ],
+	[ "test_json_parser_incremental_shlib", "-o", ]);
 
 foreach (@exes)
 {
-	$exe = $_;
-	note "testing executable $exe";
+	@exe = @$_;
+	note "testing executable @exe";
 
 	test("number", "12345");
 	test("string", '"hello"');
diff --git a/src/test/modules/test_json_parser/t/003_test_semantic.pl b/src/test/modules/test_json_parser/t/003_test_semantic.pl
index c11480172d..c57ccdb660 100644
--- a/src/test/modules/test_json_parser/t/003_test_semantic.pl
+++ b/src/test/modules/test_json_parser/t/003_test_semantic.pl
@@ -16,14 +16,17 @@ use File::Temp qw(tempfile);
 my $test_file = "$FindBin::RealBin/../tiny.json";
 my $test_out = "$FindBin::RealBin/../tiny.out";
 
-my @exes =
-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
+my @exes = (
+	[ "test_json_parser_incremental", ],
+	[ "test_json_parser_incremental", "-o", ],
+	[ "test_json_parser_incremental_shlib", ],
+	[ "test_json_parser_incremental_shlib", "-o", ]);
 
 foreach my $exe (@exes)
 {
-	note "testing executable $exe";
+	note "testing executable @$exe";
 
-	my ($stdout, $stderr) = run_command([ $exe, "-s", $test_file ]);
+	my ($stdout, $stderr) = run_command([ @$exe, "-s", $test_file ]);
 
 	is($stderr, "", "no error output");
 
diff --git a/src/test/modules/test_json_parser/test_json_parser_incremental.c b/src/test/modules/test_json_parser/test_json_parser_incremental.c
index 294e5f74ea..0b02b5203b 100644
--- a/src/test/modules/test_json_parser/test_json_parser_incremental.c
+++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
@@ -18,6 +18,10 @@
  * If the -s flag is given, the program does semantic processing. This should
  * just mirror back the json, albeit with white space changes.
  *
+ * If the -o flag is given, the JSONLEX_CTX_OWNS_TOKENS flag is set. (This can
+ * be used in combination with a leak sanitizer; without the option, the parser
+ * may leak memory with invalid JSON.)
+ *
  * The argument specifies the file containing the JSON input.
  *
  *-------------------------------------------------------------------------
@@ -72,6 +76,8 @@ static JsonSemAction sem = {
 	.scalar = do_scalar
 };
 
+static bool lex_owns_tokens = false;
+
 int
 main(int argc, char **argv)
 {
@@ -88,10 +94,11 @@ main(int argc, char **argv)
 	char	   *testfile;
 	int			c;
 	bool		need_strings = false;
+	int			ret = 0;
 
 	pg_logging_init(argv[0]);
 
-	while ((c = getopt(argc, argv, "c:s")) != -1)
+	while ((c = getopt(argc, argv, "c:os")) != -1)
 	{
 		switch (c)
 		{
@@ -100,6 +107,9 @@ main(int argc, char **argv)
 				if (chunk_size > BUFSIZE)
 					pg_fatal("chunk size cannot exceed %d", BUFSIZE);
 				break;
+			case 'o':			/* switch token ownership */
+				lex_owns_tokens = true;
+				break;
 			case 's':			/* do semantic processing */
 				testsem = &sem;
 				sem.semstate = palloc(sizeof(struct DoState));
@@ -112,7 +122,7 @@ main(int argc, char **argv)
 
 	if (optind < argc)
 	{
-		testfile = pg_strdup(argv[optind]);
+		testfile = argv[optind];
 		optind++;
 	}
 	else
@@ -122,6 +132,7 @@ main(int argc, char **argv)
 	}
 
 	makeJsonLexContextIncremental(&lex, PG_UTF8, need_strings);
+	setJsonLexContextOwnsTokens(&lex, lex_owns_tokens);
 	initStringInfo(&json);
 
 	if ((json_file = fopen(testfile, PG_BINARY_R)) == NULL)
@@ -160,7 +171,8 @@ main(int argc, char **argv)
 			if (result != JSON_INCOMPLETE)
 			{
 				fprintf(stderr, "%s\n", json_errdetail(result, &lex));
-				exit(1);
+				ret = 1;
+				goto cleanup;
 			}
 			resetStringInfo(&json);
 		}
@@ -172,15 +184,21 @@ main(int argc, char **argv)
 			if (result != JSON_SUCCESS)
 			{
 				fprintf(stderr, "%s\n", json_errdetail(result, &lex));
-				exit(1);
+				ret = 1;
+				goto cleanup;
 			}
 			if (!need_strings)
 				printf("SUCCESS!\n");
 			break;
 		}
 	}
+
+cleanup:
 	fclose(json_file);
-	exit(0);
+	freeJsonLexContext(&lex);
+	free(json.data);
+
+	return ret;
 }
 
 /*
@@ -230,7 +248,8 @@ do_object_field_start(void *state, char *fname, bool isnull)
 static JsonParseErrorType
 do_object_field_end(void *state, char *fname, bool isnull)
 {
-	/* nothing to do really */
+	if (!lex_owns_tokens)
+		free(fname);
 
 	return JSON_SUCCESS;
 }
@@ -291,6 +310,9 @@ do_scalar(void *state, char *token, JsonTokenType tokentype)
 	else
 		printf("%s", token);
 
+	if (!lex_owns_tokens)
+		free(token);
+
 	return JSON_SUCCESS;
 }
 
@@ -343,7 +365,8 @@ usage(const char *progname)
 {
 	fprintf(stderr, "Usage: %s [OPTION ...] testfile\n", progname);
 	fprintf(stderr, "Options:\n");
-	fprintf(stderr, "  -c chunksize      size of piece fed to parser (default 64)n");
+	fprintf(stderr, "  -c chunksize      size of piece fed to parser (default 64)\n");
+	fprintf(stderr, "  -o                set JSONLEX_CTX_OWNS_TOKENS for leak checking\n");
 	fprintf(stderr, "  -s                do semantic processing\n");
 
 }
-- 
2.34.1

v4-0002-jsonapi-fully-initialize-dummy-lexer.patchapplication/octet-stream; name=v4-0002-jsonapi-fully-initialize-dummy-lexer.patchDownload
From d3e639ba2bacf64fc0d2eb3aa9364a87030335f5 Mon Sep 17 00:00:00 2001
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Mon, 7 Oct 2024 14:41:31 -0700
Subject: [PATCH v4 2/2] jsonapi: fully initialize dummy lexer

Valgrind reports that checks on lex->inc_state are undefined for the
"dummy lexer" used for incremental parsing, since it's only partially
initialized on the stack. This was introduced in 0785d1b8b2.
Zero-initialize the whole struct.
---
 src/common/jsonapi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index df6e633b5e..0e2a82ad7a 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -1622,7 +1622,7 @@ json_lex(JsonLexContext *lex)
 		jsonapi_StrValType *ptok = &(lex->inc_state->partial_token);
 		size_t		added = 0;
 		bool		tok_done = false;
-		JsonLexContext dummy_lex;
+		JsonLexContext dummy_lex = {0};
 		JsonParseErrorType partial_result;
 
 		if (ptok->data[0] == '"')
-- 
2.34.1

#9Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Jacob Champion (#8)
1 attachment(s)
Re: pg_parse_json() should not leak token copies on failure

On Mon, Oct 7, 2024 at 3:45 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:

I've added a 0002 as well.

0002 has since been applied [1]https://git.postgresql.org/cgit/postgresql.git/commit/?id=41b023946d so I'm reattaching v4-0001 to get the
cfbot happy again.

--Jacob

[1]: https://git.postgresql.org/cgit/postgresql.git/commit/?id=41b023946d

Attachments:

v4-0001-jsonapi-add-lexer-option-to-keep-token-ownership.patchapplication/octet-stream; name=v4-0001-jsonapi-add-lexer-option-to-keep-token-ownership.patchDownload
From 6a0d088cbe7d4ece3762cb867e3d6588d18ad8f6 Mon Sep 17 00:00:00 2001
From: Jacob Champion <jacob.champion@enterprisedb.com>
Date: Thu, 25 Apr 2024 15:26:40 -0700
Subject: [PATCH v4 1/2] jsonapi: add lexer option to keep token ownership

Commit 0785d1b8b adds support for libpq as a JSON client, but
allocations for string tokens can still be leaked during parsing
failures. This is tricky to fix for the object_field semantic callbacks:
the field name must remain valid until the end of the object, but if a
parsing error is encountered partway through, object_field_end() won't
be invoked and the client won't get a chance to free the field name.

At Andrew's suggestion, add a flag to switch the ownership of parsed
tokens to the lexer. When this is enabled, the client must make a copy
of any tokens it wants to persist past the callback lifetime, but the
lexer will handle necessary cleanup on failure.

A -o option has been added to test_json_parser_incremental to exercise
the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
tests make use of it. (The test program now cleans up allocated memory,
so that tests can be usefully run under leak sanitizers.)
---
 src/common/jsonapi.c                          | 102 ++++++++++++++++--
 src/include/common/jsonapi.h                  |  28 ++++-
 .../t/001_test_json_parser_incremental.pl     |  13 ++-
 .../modules/test_json_parser/t/002_inline.pl  |  15 +--
 .../test_json_parser/t/003_test_semantic.pl   |  11 +-
 .../test_json_parser_incremental.c            |  37 +++++--
 6 files changed, 173 insertions(+), 33 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 45838d8a18..df6e633b5e 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -161,6 +161,7 @@ struct JsonParserStack
  */
 struct JsonIncrementalState
 {
+	bool		started;
 	bool		is_last_chunk;
 	bool		partial_completed;
 	jsonapi_StrValType partial_token;
@@ -280,6 +281,7 @@ static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSem
 static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem);
 static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
 static bool allocate_incremental_state(JsonLexContext *lex);
+static inline void set_fname(JsonLexContext *lex, char *fname);
 
 /* the null action object used for pure validation */
 const JsonSemAction nullSemAction =
@@ -437,7 +439,7 @@ allocate_incremental_state(JsonLexContext *lex)
 			   *fnull;
 
 	lex->inc_state = ALLOC0(sizeof(JsonIncrementalState));
-	pstack = ALLOC(sizeof(JsonParserStack));
+	pstack = ALLOC0(sizeof(JsonParserStack));
 	prediction = ALLOC(JS_STACK_CHUNK_SIZE * JS_MAX_PROD_LEN);
 	fnames = ALLOC(JS_STACK_CHUNK_SIZE * sizeof(char *));
 	fnull = ALLOC(JS_STACK_CHUNK_SIZE * sizeof(bool));
@@ -464,10 +466,17 @@ allocate_incremental_state(JsonLexContext *lex)
 	lex->pstack = pstack;
 	lex->pstack->stack_size = JS_STACK_CHUNK_SIZE;
 	lex->pstack->prediction = prediction;
-	lex->pstack->pred_index = 0;
 	lex->pstack->fnames = fnames;
 	lex->pstack->fnull = fnull;
 
+	/*
+	 * fnames between 0 and lex_level must always be defined so that
+	 * freeJsonLexContext() can handle them safely. inc/dec_lex_level() handle
+	 * the rest.
+	 */
+	Assert(lex->lex_level == 0);
+	lex->pstack->fnames[0] = NULL;
+
 	lex->incremental = true;
 	return true;
 }
@@ -530,6 +539,25 @@ makeJsonLexContextIncremental(JsonLexContext *lex, int encoding,
 	return lex;
 }
 
+void
+setJsonLexContextOwnsTokens(JsonLexContext *lex, bool owned_by_context)
+{
+	if (lex->incremental && lex->inc_state->started)
+	{
+		/*
+		 * Switching this flag after parsing has already started is a
+		 * programming error.
+		 */
+		Assert(false);
+		return;
+	}
+
+	if (owned_by_context)
+		lex->flags |= JSONLEX_CTX_OWNS_TOKENS;
+	else
+		lex->flags &= ~JSONLEX_CTX_OWNS_TOKENS;
+}
+
 static inline bool
 inc_lex_level(JsonLexContext *lex)
 {
@@ -569,12 +597,23 @@ inc_lex_level(JsonLexContext *lex)
 	}
 
 	lex->lex_level += 1;
+
+	if (lex->incremental)
+	{
+		/*
+		 * Ensure freeJsonLexContext() remains safe even if no fname is
+		 * assigned at this level.
+		 */
+		lex->pstack->fnames[lex->lex_level] = NULL;
+	}
+
 	return true;
 }
 
 static inline void
 dec_lex_level(JsonLexContext *lex)
 {
+	set_fname(lex, NULL);		/* free the current level's fname, if needed */
 	lex->lex_level -= 1;
 }
 
@@ -608,6 +647,15 @@ have_prediction(JsonParserStack *pstack)
 static inline void
 set_fname(JsonLexContext *lex, char *fname)
 {
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+	{
+		/*
+		 * Don't leak prior fnames. If one hasn't been assigned yet,
+		 * inc_lex_level ensured that it's NULL (and therefore safe to free).
+		 */
+		FREE(lex->pstack->fnames[lex->lex_level]);
+	}
+
 	lex->pstack->fnames[lex->lex_level] = fname;
 }
 
@@ -655,8 +703,19 @@ freeJsonLexContext(JsonLexContext *lex)
 		jsonapi_termStringInfo(&lex->inc_state->partial_token);
 		FREE(lex->inc_state);
 		FREE(lex->pstack->prediction);
+
+		if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		{
+			int			i;
+
+			/* Clean up any tokens that were left behind. */
+			for (i = 0; i <= lex->lex_level; i++)
+				FREE(lex->pstack->fnames[i]);
+		}
+
 		FREE(lex->pstack->fnames);
 		FREE(lex->pstack->fnull);
+		FREE(lex->pstack->scalar_val);
 		FREE(lex->pstack);
 	}
 
@@ -826,6 +885,7 @@ pg_parse_json_incremental(JsonLexContext *lex,
 	lex->input = lex->token_terminator = lex->line_start = json;
 	lex->input_length = len;
 	lex->inc_state->is_last_chunk = is_last;
+	lex->inc_state->started = true;
 
 	/* get the initial token */
 	result = json_lex(lex);
@@ -1086,6 +1146,17 @@ pg_parse_json_incremental(JsonLexContext *lex,
 						if (sfunc != NULL)
 						{
 							result = (*sfunc) (sem->semstate, pstack->scalar_val, pstack->scalar_tok);
+
+							/*
+							 * Either ownership of the token passed to the
+							 * callback, or we need to free it now. Either
+							 * way, clear our pointer to it so it doesn't get
+							 * freed in the future.
+							 */
+							if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+								FREE(pstack->scalar_val);
+							pstack->scalar_val = NULL;
+
 							if (result != JSON_SUCCESS)
 								return result;
 						}
@@ -1221,11 +1292,17 @@ parse_scalar(JsonLexContext *lex, const JsonSemAction *sem)
 	/* consume the token */
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(val);
 		return result;
+	}
 
-	/* invoke the callback */
+	/* invoke the callback, which may take ownership of val */
 	result = (*sfunc) (sem->semstate, val, tok);
 
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		FREE(val);
+
 	return result;
 }
 
@@ -1238,7 +1315,7 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	 * generally call a field name a "key".
 	 */
 
-	char	   *fname = NULL;	/* keep compiler quiet */
+	char	   *fname = NULL;
 	json_ofield_action ostart = sem->object_field_start;
 	json_ofield_action oend = sem->object_field_end;
 	bool		isnull;
@@ -1255,11 +1332,17 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	}
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(fname);
 		return result;
+	}
 
 	result = lex_expect(JSON_PARSE_OBJECT_LABEL, lex, JSON_TOKEN_COLON);
 	if (result != JSON_SUCCESS)
+	{
+		FREE(fname);
 		return result;
+	}
 
 	tok = lex_peek(lex);
 	isnull = tok == JSON_TOKEN_NULL;
@@ -1268,7 +1351,7 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 	{
 		result = (*ostart) (sem->semstate, fname, isnull);
 		if (result != JSON_SUCCESS)
-			return result;
+			goto ofield_cleanup;
 	}
 
 	switch (tok)
@@ -1283,16 +1366,19 @@ parse_object_field(JsonLexContext *lex, const JsonSemAction *sem)
 			result = parse_scalar(lex, sem);
 	}
 	if (result != JSON_SUCCESS)
-		return result;
+		goto ofield_cleanup;
 
 	if (oend != NULL)
 	{
 		result = (*oend) (sem->semstate, fname, isnull);
 		if (result != JSON_SUCCESS)
-			return result;
+			goto ofield_cleanup;
 	}
 
-	return JSON_SUCCESS;
+ofield_cleanup:
+	if (lex->flags & JSONLEX_CTX_OWNS_TOKENS)
+		FREE(fname);
+	return result;
 }
 
 static JsonParseErrorType
diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index c524ff5be8..167615a557 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -92,9 +92,11 @@ typedef struct JsonIncrementalState JsonIncrementalState;
  * conjunction with token_start.
  *
  * JSONLEX_FREE_STRUCT/STRVAL are used to drive freeJsonLexContext.
+ * JSONLEX_CTX_OWNS_TOKENS is used by setJsonLexContextOwnsTokens.
  */
 #define JSONLEX_FREE_STRUCT			(1 << 0)
 #define JSONLEX_FREE_STRVAL			(1 << 1)
+#define JSONLEX_CTX_OWNS_TOKENS		(1 << 2)
 typedef struct JsonLexContext
 {
 	const char *input;
@@ -130,9 +132,10 @@ typedef JsonParseErrorType (*json_scalar_action) (void *state, char *token, Json
  * to doing a pure parse with no side-effects, and is therefore exactly
  * what the json input routines do.
  *
- * The 'fname' and 'token' strings passed to these actions are palloc'd.
- * They are not free'd or used further by the parser, so the action function
- * is free to do what it wishes with them.
+ * By default, the 'fname' and 'token' strings passed to these actions are
+ * palloc'd.  They are not free'd or used further by the parser, so the action
+ * function is free to do what it wishes with them. This behavior may be
+ * modified by setJsonLexContextOwnsTokens().
  *
  * All action functions return JsonParseErrorType.  If the result isn't
  * JSON_SUCCESS, the parse is abandoned and that error code is returned.
@@ -216,6 +219,25 @@ extern JsonLexContext *makeJsonLexContextIncremental(JsonLexContext *lex,
 													 int encoding,
 													 bool need_escapes);
 
+/*
+ * Sets whether tokens passed to semantic action callbacks are owned by the
+ * context (in which case, the callback must duplicate the tokens for long-term
+ * storage) or by the callback (in which case, the callback must explicitly
+ * free tokens to avoid leaks).
+ *
+ * By default, this setting is false: the callback owns the tokens that are
+ * passed to it (and if parsing fails between the two object-field callbacks,
+ * the field name token will likely leak). If set to true, tokens will be freed
+ * by the lexer after the callback completes.
+ *
+ * Setting this to true is important for long-lived clients (such as libpq)
+ * that must not leak memory during a parse failure. For a server backend using
+ * memory contexts, or a client application which will exit on parse failure,
+ * this setting is less critical.
+ */
+extern void setJsonLexContextOwnsTokens(JsonLexContext *lex,
+										bool owned_by_context);
+
 extern void freeJsonLexContext(JsonLexContext *lex);
 
 /* lex one token */
diff --git a/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl b/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
index 8cc42e8e29..0c663b8e68 100644
--- a/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
+++ b/src/test/modules/test_json_parser/t/001_test_json_parser_incremental.pl
@@ -13,21 +13,24 @@ use FindBin;
 
 my $test_file = "$FindBin::RealBin/../tiny.json";
 
-my @exes =
-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
+my @exes = (
+	[ "test_json_parser_incremental", ],
+	[ "test_json_parser_incremental", "-o", ],
+	[ "test_json_parser_incremental_shlib", ],
+	[ "test_json_parser_incremental_shlib", "-o", ]);
 
 foreach my $exe (@exes)
 {
-	note "testing executable $exe";
+	note "testing executable @$exe";
 
 	# Test the  usage error
-	my ($stdout, $stderr) = run_command([ $exe, "-c", 10 ]);
+	my ($stdout, $stderr) = run_command([ @$exe, "-c", 10 ]);
 	like($stderr, qr/Usage:/, 'error message if not enough arguments');
 
 	# Test that we get success for small chunk sizes from 64 down to 1.
 	for (my $size = 64; $size > 0; $size--)
 	{
-		($stdout, $stderr) = run_command([ $exe, "-c", $size, $test_file ]);
+		($stdout, $stderr) = run_command([ @$exe, "-c", $size, $test_file ]);
 
 		like($stdout, qr/SUCCESS/, "chunk size $size: test succeeds");
 		is($stderr, "", "chunk size $size: no error output");
diff --git a/src/test/modules/test_json_parser/t/002_inline.pl b/src/test/modules/test_json_parser/t/002_inline.pl
index 5b6c6dc4ae..71c462b319 100644
--- a/src/test/modules/test_json_parser/t/002_inline.pl
+++ b/src/test/modules/test_json_parser/t/002_inline.pl
@@ -13,7 +13,7 @@ use Test::More;
 use File::Temp qw(tempfile);
 
 my $dir = PostgreSQL::Test::Utils::tempdir;
-my $exe;
+my @exe;
 
 sub test
 {
@@ -35,7 +35,7 @@ sub test
 
 	foreach my $size (reverse(1 .. $chunk))
 	{
-		my ($stdout, $stderr) = run_command([ $exe, "-c", $size, $fname ]);
+		my ($stdout, $stderr) = run_command([ @exe, "-c", $size, $fname ]);
 
 		if (defined($params{error}))
 		{
@@ -53,13 +53,16 @@ sub test
 	}
 }
 
-my @exes =
-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
+my @exes = (
+	[ "test_json_parser_incremental", ],
+	[ "test_json_parser_incremental", "-o", ],
+	[ "test_json_parser_incremental_shlib", ],
+	[ "test_json_parser_incremental_shlib", "-o", ]);
 
 foreach (@exes)
 {
-	$exe = $_;
-	note "testing executable $exe";
+	@exe = @$_;
+	note "testing executable @exe";
 
 	test("number", "12345");
 	test("string", '"hello"');
diff --git a/src/test/modules/test_json_parser/t/003_test_semantic.pl b/src/test/modules/test_json_parser/t/003_test_semantic.pl
index c11480172d..c57ccdb660 100644
--- a/src/test/modules/test_json_parser/t/003_test_semantic.pl
+++ b/src/test/modules/test_json_parser/t/003_test_semantic.pl
@@ -16,14 +16,17 @@ use File::Temp qw(tempfile);
 my $test_file = "$FindBin::RealBin/../tiny.json";
 my $test_out = "$FindBin::RealBin/../tiny.out";
 
-my @exes =
-  ("test_json_parser_incremental", "test_json_parser_incremental_shlib");
+my @exes = (
+	[ "test_json_parser_incremental", ],
+	[ "test_json_parser_incremental", "-o", ],
+	[ "test_json_parser_incremental_shlib", ],
+	[ "test_json_parser_incremental_shlib", "-o", ]);
 
 foreach my $exe (@exes)
 {
-	note "testing executable $exe";
+	note "testing executable @$exe";
 
-	my ($stdout, $stderr) = run_command([ $exe, "-s", $test_file ]);
+	my ($stdout, $stderr) = run_command([ @$exe, "-s", $test_file ]);
 
 	is($stderr, "", "no error output");
 
diff --git a/src/test/modules/test_json_parser/test_json_parser_incremental.c b/src/test/modules/test_json_parser/test_json_parser_incremental.c
index 294e5f74ea..0b02b5203b 100644
--- a/src/test/modules/test_json_parser/test_json_parser_incremental.c
+++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
@@ -18,6 +18,10 @@
  * If the -s flag is given, the program does semantic processing. This should
  * just mirror back the json, albeit with white space changes.
  *
+ * If the -o flag is given, the JSONLEX_CTX_OWNS_TOKENS flag is set. (This can
+ * be used in combination with a leak sanitizer; without the option, the parser
+ * may leak memory with invalid JSON.)
+ *
  * The argument specifies the file containing the JSON input.
  *
  *-------------------------------------------------------------------------
@@ -72,6 +76,8 @@ static JsonSemAction sem = {
 	.scalar = do_scalar
 };
 
+static bool lex_owns_tokens = false;
+
 int
 main(int argc, char **argv)
 {
@@ -88,10 +94,11 @@ main(int argc, char **argv)
 	char	   *testfile;
 	int			c;
 	bool		need_strings = false;
+	int			ret = 0;
 
 	pg_logging_init(argv[0]);
 
-	while ((c = getopt(argc, argv, "c:s")) != -1)
+	while ((c = getopt(argc, argv, "c:os")) != -1)
 	{
 		switch (c)
 		{
@@ -100,6 +107,9 @@ main(int argc, char **argv)
 				if (chunk_size > BUFSIZE)
 					pg_fatal("chunk size cannot exceed %d", BUFSIZE);
 				break;
+			case 'o':			/* switch token ownership */
+				lex_owns_tokens = true;
+				break;
 			case 's':			/* do semantic processing */
 				testsem = &sem;
 				sem.semstate = palloc(sizeof(struct DoState));
@@ -112,7 +122,7 @@ main(int argc, char **argv)
 
 	if (optind < argc)
 	{
-		testfile = pg_strdup(argv[optind]);
+		testfile = argv[optind];
 		optind++;
 	}
 	else
@@ -122,6 +132,7 @@ main(int argc, char **argv)
 	}
 
 	makeJsonLexContextIncremental(&lex, PG_UTF8, need_strings);
+	setJsonLexContextOwnsTokens(&lex, lex_owns_tokens);
 	initStringInfo(&json);
 
 	if ((json_file = fopen(testfile, PG_BINARY_R)) == NULL)
@@ -160,7 +171,8 @@ main(int argc, char **argv)
 			if (result != JSON_INCOMPLETE)
 			{
 				fprintf(stderr, "%s\n", json_errdetail(result, &lex));
-				exit(1);
+				ret = 1;
+				goto cleanup;
 			}
 			resetStringInfo(&json);
 		}
@@ -172,15 +184,21 @@ main(int argc, char **argv)
 			if (result != JSON_SUCCESS)
 			{
 				fprintf(stderr, "%s\n", json_errdetail(result, &lex));
-				exit(1);
+				ret = 1;
+				goto cleanup;
 			}
 			if (!need_strings)
 				printf("SUCCESS!\n");
 			break;
 		}
 	}
+
+cleanup:
 	fclose(json_file);
-	exit(0);
+	freeJsonLexContext(&lex);
+	free(json.data);
+
+	return ret;
 }
 
 /*
@@ -230,7 +248,8 @@ do_object_field_start(void *state, char *fname, bool isnull)
 static JsonParseErrorType
 do_object_field_end(void *state, char *fname, bool isnull)
 {
-	/* nothing to do really */
+	if (!lex_owns_tokens)
+		free(fname);
 
 	return JSON_SUCCESS;
 }
@@ -291,6 +310,9 @@ do_scalar(void *state, char *token, JsonTokenType tokentype)
 	else
 		printf("%s", token);
 
+	if (!lex_owns_tokens)
+		free(token);
+
 	return JSON_SUCCESS;
 }
 
@@ -343,7 +365,8 @@ usage(const char *progname)
 {
 	fprintf(stderr, "Usage: %s [OPTION ...] testfile\n", progname);
 	fprintf(stderr, "Options:\n");
-	fprintf(stderr, "  -c chunksize      size of piece fed to parser (default 64)n");
+	fprintf(stderr, "  -c chunksize      size of piece fed to parser (default 64)\n");
+	fprintf(stderr, "  -o                set JSONLEX_CTX_OWNS_TOKENS for leak checking\n");
 	fprintf(stderr, "  -s                do semantic processing\n");
 
 }
-- 
2.34.1

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Jacob Champion (#9)
Re: pg_parse_json() should not leak token copies on failure

On 2024-10-31 Th 5:10 PM, Jacob Champion wrote:

On Mon, Oct 7, 2024 at 3:45 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:

I've added a 0002 as well.

0002 has since been applied [1] so I'm reattaching v4-0001 to get the
cfbot happy again.

--Jacob

[1] https://git.postgresql.org/cgit/postgresql.git/commit/?id=41b023946d

Thanks, committed.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com