Server crash while trying to read expression using pg_get_expr()

Started by Rushabh Lathiaover 15 years ago20 messages
#1Rushabh Lathia
rushabh.lathia@gmail.com

Hi,

Server crash while trying to read expression(wrong) using pg_get_expr().

postgres=# SELECT pg_get_expr('{FUNCEXPR', 1255);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

Thanks,
Rushabh Lathia
www.EnterpriseDB.com

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Rushabh Lathia (#1)
1 attachment(s)
Re: Server crash while trying to read expression using pg_get_expr()

On 03/06/10 10:21, Rushabh Lathia wrote:

Server crash while trying to read expression(wrong) using pg_get_expr().

postgres=# SELECT pg_get_expr('{FUNCEXPR', 1255);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

In readfuncs.c, we don't check the return value of pg_strtok, and pass a
NULL to atoi(). The fix is pretty straightforward, we just have to be
more careful with validating the input, see attached patch.

However, I'm afraid we're lacking in input validation of read-funcs in
general. After some random hacking, I found this:

postgres=# SELECT pg_get_expr('{FUNCEXPR 1 2 3 4 4 5 6 7 8 9 9 } }', 1255);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Which still crashes despite the patch. Does anyone have an idea on how
to validate the input in a more wholesale fashion, so that we don't need
to plug these holes one by one?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

pg_get_expr-fix-1.patchtext/x-diff; name=pg_get_expr-fix-1.patchDownload
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bc6e2a6..0010572 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -57,66 +57,66 @@
 
 /* Read an integer field (anything written as ":fldname %d") */
 #define READ_INT_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = atoi(token)
 
 /* Read an unsigned integer field (anything written as ":fldname %u") */
 #define READ_UINT_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = atoui(token)
 
 /* Read an OID field (don't hard-wire assumption that OID is same as uint) */
 #define READ_OID_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = atooid(token)
 
 /* Read a char field (ie, one ascii character) */
 #define READ_CHAR_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = token[0]
 
 /* Read an enumerated-type field that was written as an integer code */
 #define READ_ENUM_FIELD(fldname, enumtype) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = (enumtype) atoi(token)
 
 /* Read a float field */
 #define READ_FLOAT_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = atof(token)
 
 /* Read a boolean field */
 #define READ_BOOL_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = strtobool(token)
 
 /* Read a character-string field */
 #define READ_STRING_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = nullable_string(token, length)
 
 /* Read a parse location field (and throw away the value, per notes above) */
 #define READ_LOCATION_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* get field value */ \
 	local_node->fldname = -1	/* set field to "unknown" */
 
 /* Read a Node field */
 #define READ_NODE_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
 	local_node->fldname = nodeRead(NULL, 0)
 
 /* Read a bitmapset field */
 #define READ_BITMAPSET_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
+	token = pg_strtok_mandatory(&length);		/* skip :fldname */ \
 	local_node->fldname = _readBitmapset()
 
 /* Routine exit */
@@ -139,8 +139,20 @@
 #define nullable_string(token,length)  \
 	((length) == 0 ? NULL : debackslash(token, length))
 
-
 static Datum readDatum(bool typbyval);
+static char *pg_strtok_mandatory(int *length);
+
+/*
+ * Like pg_strtok(), but throws an error on end of string
+ */
+static char *
+pg_strtok_mandatory(int *length)
+{
+	char *token = pg_strtok(length);
+	if (token == NULL)
+		elog(ERROR, "unexpected end of string while reading tree node");
+	return token;
+}
 
 /*
  * _readBitmapset
@@ -427,9 +439,9 @@ _readConst(void)
 	READ_BOOL_FIELD(constisnull);
 	READ_LOCATION_FIELD(location);
 
-	token = pg_strtok(&length); /* skip :constvalue */
+	token = pg_strtok_mandatory(&length); /* skip :constvalue */
 	if (local_node->constisnull)
-		token = pg_strtok(&length);		/* skip "<>" */
+		token = pg_strtok_mandatory(&length);		/* skip "<>" */
 	else
 		local_node->constvalue = readDatum(local_node->constbyval);
 
@@ -640,8 +652,8 @@ _readBoolExpr(void)
 	READ_LOCALS(BoolExpr);
 
 	/* do-it-yourself enum representation */
-	token = pg_strtok(&length); /* skip :boolop */
-	token = pg_strtok(&length); /* get field value */
+	token = pg_strtok_mandatory(&length); /* skip :boolop */
+	token = pg_strtok_mandatory(&length); /* get field value */
 	if (strncmp(token, "and", 3) == 0)
 		local_node->boolop = AND_EXPR;
 	else if (strncmp(token, "or", 2) == 0)
@@ -1182,7 +1194,7 @@ _readRangeTblEntry(void)
  * Given a character string representing a node tree, parseNodeString creates
  * the internal node structure.
  *
- * The string to be read must already have been loaded into pg_strtok().
+ * The string to be read must already have been loaded into pg_strtok_mandatory().
  */
 Node *
 parseNodeString(void)
@@ -1191,7 +1203,7 @@ parseNodeString(void)
 
 	READ_TEMP_LOCALS();
 
-	token = pg_strtok(&length);
+	token = pg_strtok_mandatory(&length);
 
 #define MATCH(tokname, namelen) \
 	(length == namelen && strncmp(token, tokname, namelen) == 0)
@@ -1328,7 +1340,7 @@ readDatum(bool typbyval)
 	/*
 	 * read the actual length of the value
 	 */
-	token = pg_strtok(&tokenLength);
+	token = pg_strtok_mandatory(&tokenLength);
 	length = atoui(token);
 
 	token = pg_strtok(&tokenLength);	/* read the '[' */
@@ -1346,7 +1358,7 @@ readDatum(bool typbyval)
 		s = (char *) (&res);
 		for (i = 0; i < (Size) sizeof(Datum); i++)
 		{
-			token = pg_strtok(&tokenLength);
+			token = pg_strtok_mandatory(&tokenLength);
 			s[i] = (char) atoi(token);
 		}
 	}
@@ -1357,7 +1369,7 @@ readDatum(bool typbyval)
 		s = (char *) palloc(length);
 		for (i = 0; i < length; i++)
 		{
-			token = pg_strtok(&tokenLength);
+			token = pg_strtok_mandatory(&tokenLength);
 			s[i] = (char) atoi(token);
 		}
 		res = PointerGetDatum(s);
#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#2)
1 attachment(s)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

(moving to pgsql-hackers)

On 03/06/10 10:37, Heikki Linnakangas wrote:

However, I'm afraid we're lacking in input validation of read-funcs in
general. ...

Does anyone have an idea on how
to validate the input in a more wholesale fashion, so that we don't need
to plug these holes one by one?

Apparently not :-(.

We have two options:

1. Make pg_get_expr() handle arbitrary (possibly even malicious) input
gracefully.

2. Restrict pg_get_expr() to superusers only.

Does anyone want to argue for option 2? We could create views using
pg_get_expr() over the internal catalogs that store trees, so that
regular users could access those without being able to pass arbitrary
input to pg_get_expr(). However, it would break existing applications,
at least pgAdmin uses pg_get_expr().

Assuming we want to make pg_get_expr() check its input, we need to:

* plug the hole Rushabh reported, and not crash on premature end of string
* check all Node types, so that you e.g. can't pass an Integer in a
field that's supposed to hold a CaseWhenExpr
* similarly, check that all Lists contain elements of the right type.

This can all be done in a straightforward way in readfuncs.c, we just
need a bit more decoration to all the READ_* macros. However, that's
still not enough; the functions in ruleutils.c make a number of other
assumptions, like that an OpExpr always has exactly one or two
arguments. That kind of assumptions will need to be explicitly checked.
Many of them already have Asserts, they need to be turned into elogs.

Thoughts? Attached is a patch for the readfuncs.c changes. Unless
someone has a better idea, I'll start going through ruleutils.c and add
explicit checks for any unsafe assumptions. It's going to be a lot of
work, as there's a lot of code in ruleutils.c and the changes need to be
backpatched as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

readfuncs-checks-1.patchtext/x-diff; name=readfuncs-checks-1.patchDownload
diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c
index 5af15c2..d0157d0 100644
--- a/src/backend/nodes/read.c
+++ b/src/backend/nodes/read.c
@@ -50,7 +50,7 @@ stringToNode(char *str)
 
 	pg_strtok_ptr = str;		/* point pg_strtok at the string to read */
 
-	retval = nodeRead(NULL, 0); /* do the reading */
+	retval = nodeRead(NULL, 0, 0); /* do the reading */
 
 	pg_strtok_ptr = save_strtok;
 
@@ -266,15 +266,18 @@ nodeTokenType(char *token, int length)
  * The return value is declared void *, not Node *, to avoid having to
  * cast it explicitly in callers that assign to fields of different types.
  *
- * External callers should always pass NULL/0 for the arguments.  Internally
+ * External callers should always pass NULL/0 for the token/tok_lenarguments.  Internally
  * a non-NULL token may be passed when the upper recursion level has already
  * scanned the first token of a node's representation.
  *
+ * If expectedType is non-zero, the node must of the given type, or an error
+ * is thrown.
+ *
  * We assume pg_strtok is already initialized with a string to read (hence
  * this should only be invoked from within a stringToNode operation).
  */
 void *
-nodeRead(char *token, int tok_len)
+nodeRead(char *token, int tok_len, int expectedType)
 {
 	Node	   *result;
 	NodeTag		type;
@@ -358,7 +361,7 @@ nodeRead(char *token, int tok_len)
 						/* We have already scanned next token... */
 						if (token[0] == ')')
 							break;
-						l = lappend(l, nodeRead(token, tok_len));
+						l = lappend(l, nodeRead(token, tok_len, 0));
 						token = pg_strtok(&tok_len);
 						if (token == NULL)
 							elog(ERROR, "unterminated List structure");
@@ -419,5 +422,9 @@ nodeRead(char *token, int tok_len)
 			break;
 	}
 
+	if (expectedType != 0 && nodeTag(result) != expectedType)
+		elog(ERROR, "node type %d found, expected %d",
+			 nodeTag(result), expectedType);
+
 	return (void *) result;
 }
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bc6e2a6..cb814fd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -57,66 +57,90 @@
 
 /* Read an integer field (anything written as ":fldname %d") */
 #define READ_INT_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = atoi(token)
 
 /* Read an unsigned integer field (anything written as ":fldname %u") */
 #define READ_UINT_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = atoui(token)
 
 /* Read an OID field (don't hard-wire assumption that OID is same as uint) */
 #define READ_OID_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = atooid(token)
 
 /* Read a char field (ie, one ascii character) */
 #define READ_CHAR_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = token[0]
 
 /* Read an enumerated-type field that was written as an integer code */
 #define READ_ENUM_FIELD(fldname, enumtype) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = (enumtype) atoi(token)
 
 /* Read a float field */
 #define READ_FLOAT_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = atof(token)
 
 /* Read a boolean field */
 #define READ_BOOL_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = strtobool(token)
 
 /* Read a character-string field */
 #define READ_STRING_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = nullable_string(token, length)
 
 /* Read a parse location field (and throw away the value, per notes above) */
 #define READ_LOCATION_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	token = pg_strtok(&length);		/* get field value */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* get field value */ \
 	local_node->fldname = -1	/* set field to "unknown" */
 
 /* Read a Node field */
-#define READ_NODE_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
-	local_node->fldname = nodeRead(NULL, 0)
+#define READ_NODE_FIELD(fldname, type) \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	local_node->fldname = (type *) nodeRead(NULL, 0, T_##type)
+
+/* Read a Node field */
+#define READ_ANY_NODE_FIELD(fldname) \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	local_node->fldname = nodeRead(NULL, 0, 0)
+
+/* Read a List field */
+#define READ_LIST_FIELD(fldname, type)							\
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	local_node->fldname = readList(T_##type)
+
+#define READ_ANY_LIST_FIELD(fldname)							\
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	local_node->fldname = (List *) nodeRead(NULL, 0, T_List);
+
+/* Read a IntList field */
+#define READ_INT_LIST_FIELD(fldname) \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	local_node->fldname = nodeRead(NULL, 0, T_IntList)
+
+/* Read a OidList field */
+#define READ_OID_LIST_FIELD(fldname)						\
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
+	local_node->fldname = nodeRead(NULL, 0, T_OidList)
 
 /* Read a bitmapset field */
 #define READ_BITMAPSET_FIELD(fldname) \
-	token = pg_strtok(&length);		/* skip :fldname */ \
+	token = pg_strtok_e(&length);		/* skip :fldname */ \
 	local_node->fldname = _readBitmapset()
 
 /* Routine exit */
@@ -139,8 +163,21 @@
 #define nullable_string(token,length)  \
 	((length) == 0 ? NULL : debackslash(token, length))
 
-
 static Datum readDatum(bool typbyval);
+static List *readList(int elemtype);
+static char *pg_strtok_e(int *length);
+
+/*
+ * Like pg_strtok(), but throws an error on end of string
+ */
+static char *
+pg_strtok_e(int *length)
+{
+	char *token = pg_strtok(length);
+	if (token == NULL)
+		elog(ERROR, "unexpected end of string while reading tree node");
+	return token;
+}
 
 /*
  * _readBitmapset
@@ -195,29 +232,29 @@ _readQuery(void)
 	READ_ENUM_FIELD(commandType, CmdType);
 	READ_ENUM_FIELD(querySource, QuerySource);
 	READ_BOOL_FIELD(canSetTag);
-	READ_NODE_FIELD(utilityStmt);
+	READ_ANY_NODE_FIELD(utilityStmt);
 	READ_INT_FIELD(resultRelation);
-	READ_NODE_FIELD(intoClause);
+	READ_NODE_FIELD(intoClause, IntoClause);
 	READ_BOOL_FIELD(hasAggs);
 	READ_BOOL_FIELD(hasWindowFuncs);
 	READ_BOOL_FIELD(hasSubLinks);
 	READ_BOOL_FIELD(hasDistinctOn);
 	READ_BOOL_FIELD(hasRecursive);
 	READ_BOOL_FIELD(hasForUpdate);
-	READ_NODE_FIELD(cteList);
-	READ_NODE_FIELD(rtable);
-	READ_NODE_FIELD(jointree);
-	READ_NODE_FIELD(targetList);
-	READ_NODE_FIELD(returningList);
-	READ_NODE_FIELD(groupClause);
-	READ_NODE_FIELD(havingQual);
-	READ_NODE_FIELD(windowClause);
-	READ_NODE_FIELD(distinctClause);
-	READ_NODE_FIELD(sortClause);
-	READ_NODE_FIELD(limitOffset);
-	READ_NODE_FIELD(limitCount);
-	READ_NODE_FIELD(rowMarks);
-	READ_NODE_FIELD(setOperations);
+	READ_LIST_FIELD(cteList, CommonTableExpr);
+	READ_LIST_FIELD(rtable, RangeTblEntry);
+	READ_NODE_FIELD(jointree, FromExpr);
+	READ_LIST_FIELD(targetList, TargetEntry);
+	READ_LIST_FIELD(returningList, TargetEntry);
+	READ_LIST_FIELD(groupClause, SortGroupClause);
+	READ_ANY_NODE_FIELD(havingQual);
+	READ_LIST_FIELD(windowClause, WindowClause);
+	READ_LIST_FIELD(distinctClause, SortGroupClause);
+	READ_LIST_FIELD(sortClause, SortGroupClause);
+	READ_ANY_NODE_FIELD(limitOffset);
+	READ_ANY_NODE_FIELD(limitCount);
+	READ_LIST_FIELD(rowMarks, RowMarkClause);
+	READ_ANY_NODE_FIELD(setOperations);
 
 	READ_DONE();
 }
@@ -246,7 +283,7 @@ _readDeclareCursorStmt(void)
 
 	READ_STRING_FIELD(portalname);
 	READ_INT_FIELD(options);
-	READ_NODE_FIELD(query);
+	READ_ANY_NODE_FIELD(query);
 
 	READ_DONE();
 }
@@ -277,11 +314,11 @@ _readWindowClause(void)
 
 	READ_STRING_FIELD(name);
 	READ_STRING_FIELD(refname);
-	READ_NODE_FIELD(partitionClause);
-	READ_NODE_FIELD(orderClause);
+	READ_LIST_FIELD(partitionClause, SortGroupClause);
+	READ_LIST_FIELD(orderClause, SortGroupClause);
 	READ_INT_FIELD(frameOptions);
-	READ_NODE_FIELD(startOffset);
-	READ_NODE_FIELD(endOffset);
+	READ_ANY_NODE_FIELD(startOffset);
+	READ_ANY_NODE_FIELD(endOffset);
 	READ_UINT_FIELD(winref);
 	READ_BOOL_FIELD(copiedOrder);
 
@@ -313,14 +350,14 @@ _readCommonTableExpr(void)
 	READ_LOCALS(CommonTableExpr);
 
 	READ_STRING_FIELD(ctename);
-	READ_NODE_FIELD(aliascolnames);
-	READ_NODE_FIELD(ctequery);
+	READ_LIST_FIELD(aliascolnames, String);
+	READ_ANY_NODE_FIELD(ctequery);
 	READ_LOCATION_FIELD(location);
 	READ_BOOL_FIELD(cterecursive);
 	READ_INT_FIELD(cterefcount);
-	READ_NODE_FIELD(ctecolnames);
-	READ_NODE_FIELD(ctecoltypes);
-	READ_NODE_FIELD(ctecoltypmods);
+	READ_LIST_FIELD(ctecolnames, String);
+	READ_OID_LIST_FIELD(ctecoltypes);
+	READ_INT_LIST_FIELD(ctecoltypmods);
 
 	READ_DONE();
 }
@@ -335,11 +372,11 @@ _readSetOperationStmt(void)
 
 	READ_ENUM_FIELD(op, SetOperation);
 	READ_BOOL_FIELD(all);
-	READ_NODE_FIELD(larg);
-	READ_NODE_FIELD(rarg);
-	READ_NODE_FIELD(colTypes);
-	READ_NODE_FIELD(colTypmods);
-	READ_NODE_FIELD(groupClauses);
+	READ_ANY_NODE_FIELD(larg);
+	READ_ANY_NODE_FIELD(rarg);
+	READ_OID_LIST_FIELD(colTypes);
+	READ_INT_LIST_FIELD(colTypmods);
+	READ_LIST_FIELD(groupClauses, SortGroupClause);
 
 	READ_DONE();
 }
@@ -355,7 +392,7 @@ _readAlias(void)
 	READ_LOCALS(Alias);
 
 	READ_STRING_FIELD(aliasname);
-	READ_NODE_FIELD(colnames);
+	READ_LIST_FIELD(colnames, String);
 
 	READ_DONE();
 }
@@ -372,7 +409,7 @@ _readRangeVar(void)
 	READ_STRING_FIELD(relname);
 	READ_ENUM_FIELD(inhOpt, InhOption);
 	READ_BOOL_FIELD(istemp);
-	READ_NODE_FIELD(alias);
+	READ_NODE_FIELD(alias, Alias);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -383,9 +420,9 @@ _readIntoClause(void)
 {
 	READ_LOCALS(IntoClause);
 
-	READ_NODE_FIELD(rel);
-	READ_NODE_FIELD(colNames);
-	READ_NODE_FIELD(options);
+	READ_NODE_FIELD(rel, RangeVar);
+	READ_LIST_FIELD(colNames, ColumnDef);
+	READ_LIST_FIELD(options, DefElem);
 	READ_ENUM_FIELD(onCommit, OnCommitAction);
 	READ_STRING_FIELD(tableSpaceName);
 
@@ -463,9 +500,9 @@ _readAggref(void)
 
 	READ_OID_FIELD(aggfnoid);
 	READ_OID_FIELD(aggtype);
-	READ_NODE_FIELD(args);
-	READ_NODE_FIELD(aggorder);
-	READ_NODE_FIELD(aggdistinct);
+	READ_LIST_FIELD(args, TargetEntry);
+	READ_LIST_FIELD(aggorder, SortGroupClause);
+	READ_LIST_FIELD(aggdistinct, SortGroupClause);
 	READ_BOOL_FIELD(aggstar);
 	READ_UINT_FIELD(agglevelsup);
 	READ_LOCATION_FIELD(location);
@@ -483,7 +520,7 @@ _readWindowFunc(void)
 
 	READ_OID_FIELD(winfnoid);
 	READ_OID_FIELD(wintype);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_UINT_FIELD(winref);
 	READ_BOOL_FIELD(winstar);
 	READ_BOOL_FIELD(winagg);
@@ -503,10 +540,10 @@ _readArrayRef(void)
 	READ_OID_FIELD(refarraytype);
 	READ_OID_FIELD(refelemtype);
 	READ_INT_FIELD(reftypmod);
-	READ_NODE_FIELD(refupperindexpr);
-	READ_NODE_FIELD(reflowerindexpr);
-	READ_NODE_FIELD(refexpr);
-	READ_NODE_FIELD(refassgnexpr);
+	READ_ANY_LIST_FIELD(refupperindexpr);
+	READ_ANY_LIST_FIELD(reflowerindexpr);
+	READ_ANY_NODE_FIELD(refexpr);
+	READ_ANY_NODE_FIELD(refassgnexpr);
 
 	READ_DONE();
 }
@@ -523,7 +560,7 @@ _readFuncExpr(void)
 	READ_OID_FIELD(funcresulttype);
 	READ_BOOL_FIELD(funcretset);
 	READ_ENUM_FIELD(funcformat, CoercionForm);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -537,7 +574,7 @@ _readNamedArgExpr(void)
 {
 	READ_LOCALS(NamedArgExpr);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_STRING_FIELD(name);
 	READ_INT_FIELD(argnumber);
 	READ_LOCATION_FIELD(location);
@@ -568,7 +605,7 @@ _readOpExpr(void)
 
 	READ_OID_FIELD(opresulttype);
 	READ_BOOL_FIELD(opretset);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -597,7 +634,7 @@ _readDistinctExpr(void)
 
 	READ_OID_FIELD(opresulttype);
 	READ_BOOL_FIELD(opretset);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -625,7 +662,7 @@ _readScalarArrayOpExpr(void)
 	local_node->opfuncid = InvalidOid;
 
 	READ_BOOL_FIELD(useOr);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -640,8 +677,8 @@ _readBoolExpr(void)
 	READ_LOCALS(BoolExpr);
 
 	/* do-it-yourself enum representation */
-	token = pg_strtok(&length); /* skip :boolop */
-	token = pg_strtok(&length); /* get field value */
+	token = pg_strtok_e(&length); /* skip :boolop */
+	token = pg_strtok_e(&length); /* get field value */
 	if (strncmp(token, "and", 3) == 0)
 		local_node->boolop = AND_EXPR;
 	else if (strncmp(token, "or", 2) == 0)
@@ -651,7 +688,7 @@ _readBoolExpr(void)
 	else
 		elog(ERROR, "unrecognized boolop \"%.*s\"", length, token);
 
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -666,9 +703,9 @@ _readSubLink(void)
 	READ_LOCALS(SubLink);
 
 	READ_ENUM_FIELD(subLinkType, SubLinkType);
-	READ_NODE_FIELD(testexpr);
-	READ_NODE_FIELD(operName);
-	READ_NODE_FIELD(subselect);
+	READ_ANY_NODE_FIELD(testexpr);
+	READ_LIST_FIELD(operName, String);
+	READ_ANY_NODE_FIELD(subselect);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -686,7 +723,7 @@ _readFieldSelect(void)
 {
 	READ_LOCALS(FieldSelect);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_INT_FIELD(fieldnum);
 	READ_OID_FIELD(resulttype);
 	READ_INT_FIELD(resulttypmod);
@@ -702,9 +739,9 @@ _readFieldStore(void)
 {
 	READ_LOCALS(FieldStore);
 
-	READ_NODE_FIELD(arg);
-	READ_NODE_FIELD(newvals);
-	READ_NODE_FIELD(fieldnums);
+	READ_ANY_NODE_FIELD(arg);
+	READ_ANY_LIST_FIELD(newvals);
+	READ_INT_LIST_FIELD(fieldnums);
 	READ_OID_FIELD(resulttype);
 
 	READ_DONE();
@@ -718,7 +755,7 @@ _readRelabelType(void)
 {
 	READ_LOCALS(RelabelType);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_OID_FIELD(resulttype);
 	READ_INT_FIELD(resulttypmod);
 	READ_ENUM_FIELD(relabelformat, CoercionForm);
@@ -735,7 +772,7 @@ _readCoerceViaIO(void)
 {
 	READ_LOCALS(CoerceViaIO);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_OID_FIELD(resulttype);
 	READ_ENUM_FIELD(coerceformat, CoercionForm);
 	READ_LOCATION_FIELD(location);
@@ -751,7 +788,7 @@ _readArrayCoerceExpr(void)
 {
 	READ_LOCALS(ArrayCoerceExpr);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_OID_FIELD(elemfuncid);
 	READ_OID_FIELD(resulttype);
 	READ_INT_FIELD(resulttypmod);
@@ -770,7 +807,7 @@ _readConvertRowtypeExpr(void)
 {
 	READ_LOCALS(ConvertRowtypeExpr);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_OID_FIELD(resulttype);
 	READ_ENUM_FIELD(convertformat, CoercionForm);
 	READ_LOCATION_FIELD(location);
@@ -787,9 +824,9 @@ _readCaseExpr(void)
 	READ_LOCALS(CaseExpr);
 
 	READ_OID_FIELD(casetype);
-	READ_NODE_FIELD(arg);
-	READ_NODE_FIELD(args);
-	READ_NODE_FIELD(defresult);
+	READ_ANY_NODE_FIELD(arg);
+	READ_LIST_FIELD(args, CaseWhen);
+	READ_ANY_NODE_FIELD(defresult);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -803,8 +840,8 @@ _readCaseWhen(void)
 {
 	READ_LOCALS(CaseWhen);
 
-	READ_NODE_FIELD(expr);
-	READ_NODE_FIELD(result);
+	READ_ANY_NODE_FIELD(expr);
+	READ_ANY_NODE_FIELD(result);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -834,7 +871,7 @@ _readArrayExpr(void)
 
 	READ_OID_FIELD(array_typeid);
 	READ_OID_FIELD(element_typeid);
-	READ_NODE_FIELD(elements);
+	READ_ANY_LIST_FIELD(elements);
 	READ_BOOL_FIELD(multidims);
 	READ_LOCATION_FIELD(location);
 
@@ -849,10 +886,10 @@ _readRowExpr(void)
 {
 	READ_LOCALS(RowExpr);
 
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_OID_FIELD(row_typeid);
 	READ_ENUM_FIELD(row_format, CoercionForm);
-	READ_NODE_FIELD(colnames);
+	READ_LIST_FIELD(colnames, String);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -867,10 +904,10 @@ _readRowCompareExpr(void)
 	READ_LOCALS(RowCompareExpr);
 
 	READ_ENUM_FIELD(rctype, RowCompareType);
-	READ_NODE_FIELD(opnos);
-	READ_NODE_FIELD(opfamilies);
-	READ_NODE_FIELD(largs);
-	READ_NODE_FIELD(rargs);
+	READ_OID_LIST_FIELD(opnos);
+	READ_OID_LIST_FIELD(opfamilies);
+	READ_ANY_LIST_FIELD(largs);
+	READ_ANY_LIST_FIELD(rargs);
 
 	READ_DONE();
 }
@@ -884,7 +921,7 @@ _readCoalesceExpr(void)
 	READ_LOCALS(CoalesceExpr);
 
 	READ_OID_FIELD(coalescetype);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -900,7 +937,7 @@ _readMinMaxExpr(void)
 
 	READ_OID_FIELD(minmaxtype);
 	READ_ENUM_FIELD(op, MinMaxOp);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -916,9 +953,9 @@ _readXmlExpr(void)
 
 	READ_ENUM_FIELD(op, XmlExprOp);
 	READ_STRING_FIELD(name);
-	READ_NODE_FIELD(named_args);
-	READ_NODE_FIELD(arg_names);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(named_args);
+	READ_LIST_FIELD(arg_names, String);
+	READ_ANY_LIST_FIELD(args);
 	READ_ENUM_FIELD(xmloption, XmlOptionType);
 	READ_OID_FIELD(type);
 	READ_INT_FIELD(typmod);
@@ -950,7 +987,7 @@ _readNullIfExpr(void)
 
 	READ_OID_FIELD(opresulttype);
 	READ_BOOL_FIELD(opretset);
-	READ_NODE_FIELD(args);
+	READ_ANY_LIST_FIELD(args);
 	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
@@ -964,7 +1001,7 @@ _readNullTest(void)
 {
 	READ_LOCALS(NullTest);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_ENUM_FIELD(nulltesttype, NullTestType);
 	READ_BOOL_FIELD(argisrow);
 
@@ -979,7 +1016,7 @@ _readBooleanTest(void)
 {
 	READ_LOCALS(BooleanTest);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_ENUM_FIELD(booltesttype, BoolTestType);
 
 	READ_DONE();
@@ -993,7 +1030,7 @@ _readCoerceToDomain(void)
 {
 	READ_LOCALS(CoerceToDomain);
 
-	READ_NODE_FIELD(arg);
+	READ_ANY_NODE_FIELD(arg);
 	READ_OID_FIELD(resulttype);
 	READ_INT_FIELD(resulttypmod);
 	READ_ENUM_FIELD(coercionformat, CoercionForm);
@@ -1055,7 +1092,7 @@ _readTargetEntry(void)
 {
 	READ_LOCALS(TargetEntry);
 
-	READ_NODE_FIELD(expr);
+	READ_ANY_NODE_FIELD(expr);
 	READ_INT_FIELD(resno);
 	READ_STRING_FIELD(resname);
 	READ_UINT_FIELD(ressortgroupref);
@@ -1089,11 +1126,11 @@ _readJoinExpr(void)
 
 	READ_ENUM_FIELD(jointype, JoinType);
 	READ_BOOL_FIELD(isNatural);
-	READ_NODE_FIELD(larg);
-	READ_NODE_FIELD(rarg);
-	READ_NODE_FIELD(usingClause);
-	READ_NODE_FIELD(quals);
-	READ_NODE_FIELD(alias);
+	READ_ANY_NODE_FIELD(larg);
+	READ_ANY_NODE_FIELD(rarg);
+	READ_LIST_FIELD(usingClause, String);
+	READ_ANY_NODE_FIELD(quals);
+	READ_NODE_FIELD(alias, Alias);
 	READ_INT_FIELD(rtindex);
 
 	READ_DONE();
@@ -1107,8 +1144,8 @@ _readFromExpr(void)
 {
 	READ_LOCALS(FromExpr);
 
-	READ_NODE_FIELD(fromlist);
-	READ_NODE_FIELD(quals);
+	READ_ANY_LIST_FIELD(fromlist);
+	READ_ANY_NODE_FIELD(quals);
 
 	READ_DONE();
 }
@@ -1127,8 +1164,8 @@ _readRangeTblEntry(void)
 	READ_LOCALS(RangeTblEntry);
 
 	/* put alias + eref first to make dump more legible */
-	READ_NODE_FIELD(alias);
-	READ_NODE_FIELD(eref);
+	READ_NODE_FIELD(alias, Alias);
+	READ_NODE_FIELD(eref, Alias);
 	READ_ENUM_FIELD(rtekind, RTEKind);
 
 	switch (local_node->rtekind)
@@ -1138,26 +1175,26 @@ _readRangeTblEntry(void)
 			READ_OID_FIELD(relid);
 			break;
 		case RTE_SUBQUERY:
-			READ_NODE_FIELD(subquery);
+			READ_NODE_FIELD(subquery, Query);
 			break;
 		case RTE_JOIN:
 			READ_ENUM_FIELD(jointype, JoinType);
-			READ_NODE_FIELD(joinaliasvars);
+			READ_ANY_LIST_FIELD(joinaliasvars);
 			break;
 		case RTE_FUNCTION:
-			READ_NODE_FIELD(funcexpr);
-			READ_NODE_FIELD(funccoltypes);
-			READ_NODE_FIELD(funccoltypmods);
+			READ_ANY_NODE_FIELD(funcexpr);
+			READ_OID_LIST_FIELD(funccoltypes);
+			READ_INT_LIST_FIELD(funccoltypmods);
 			break;
 		case RTE_VALUES:
-			READ_NODE_FIELD(values_lists);
+			READ_LIST_FIELD(values_lists, List);
 			break;
 		case RTE_CTE:
 			READ_STRING_FIELD(ctename);
 			READ_UINT_FIELD(ctelevelsup);
 			READ_BOOL_FIELD(self_reference);
-			READ_NODE_FIELD(ctecoltypes);
-			READ_NODE_FIELD(ctecoltypmods);
+			READ_OID_LIST_FIELD(ctecoltypes);
+			READ_INT_LIST_FIELD(ctecoltypmods);
 			break;
 		default:
 			elog(ERROR, "unrecognized RTE kind: %d",
@@ -1191,7 +1228,7 @@ parseNodeString(void)
 
 	READ_TEMP_LOCALS();
 
-	token = pg_strtok(&length);
+	token = pg_strtok_e(&length);
 
 #define MATCH(tokname, namelen) \
 	(length == namelen && strncmp(token, tokname, namelen) == 0)
@@ -1328,7 +1365,7 @@ readDatum(bool typbyval)
 	/*
 	 * read the actual length of the value
 	 */
-	token = pg_strtok(&tokenLength);
+	token = pg_strtok_e(&tokenLength);
 	length = atoui(token);
 
 	token = pg_strtok(&tokenLength);	/* read the '[' */
@@ -1346,7 +1383,7 @@ readDatum(bool typbyval)
 		s = (char *) (&res);
 		for (i = 0; i < (Size) sizeof(Datum); i++)
 		{
-			token = pg_strtok(&tokenLength);
+			token = pg_strtok_e(&tokenLength);
 			s[i] = (char) atoi(token);
 		}
 	}
@@ -1357,7 +1394,7 @@ readDatum(bool typbyval)
 		s = (char *) palloc(length);
 		for (i = 0; i < length; i++)
 		{
-			token = pg_strtok(&tokenLength);
+			token = pg_strtok_e(&tokenLength);
 			s[i] = (char) atoi(token);
 		}
 		res = PointerGetDatum(s);
@@ -1371,3 +1408,21 @@ readDatum(bool typbyval)
 
 	return res;
 }
+
+/*
+ * Reads a node list, and checks that all elements are of the given type.
+ */
+static List *
+readList(int elemtype)
+{
+	List *l = (List *) nodeRead(NULL, 0, T_List);
+	ListCell *lc;
+
+	foreach(lc, l)
+	{
+		if (nodeTag(lfirst(lc)) != elemtype)
+			elog(ERROR, "element type %d found in List, expected %d",
+				 nodeTag(lfirst(lc)), elemtype);
+	}
+	return l;
+}
diff --git a/src/include/nodes/readfuncs.h b/src/include/nodes/readfuncs.h
index f373b32..03f75aa 100644
--- a/src/include/nodes/readfuncs.h
+++ b/src/include/nodes/readfuncs.h
@@ -21,7 +21,7 @@
  */
 extern char *pg_strtok(int *length);
 extern char *debackslash(char *token, int length);
-extern void *nodeRead(char *token, int tok_len);
+extern void *nodeRead(char *token, int tok_len, int expectedType);
 
 /*
  * prototypes for functions in readfuncs.c
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#3)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

We have two options:

1. Make pg_get_expr() handle arbitrary (possibly even malicious) input
gracefully.

2. Restrict pg_get_expr() to superusers only.

I think #1 is a fool's errand. There is far too much structure to a
node tree that is outside the scope of what readfuncs.c is capable of
understanding.

regards, tom lane

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#4)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On 09/06/10 17:34, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

We have two options:

1. Make pg_get_expr() handle arbitrary (possibly even malicious) input
gracefully.

2. Restrict pg_get_expr() to superusers only.

I think #1 is a fool's errand. There is far too much structure to a
node tree that is outside the scope of what readfuncs.c is capable of
understanding.

That's why I said that ruleutils.c will need to understand and complain
about the rest.

Are you thinking we should retrict pg_get_expr() to superusers then?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#4)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On 09/06/10 17:34, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

We have two options:

1. Make pg_get_expr() handle arbitrary (possibly even malicious) input
gracefully.

2. Restrict pg_get_expr() to superusers only.

I think #1 is a fool's errand. There is far too much structure to a
node tree that is outside the scope of what readfuncs.c is capable of
understanding.

That's why I said that ruleutils.c will need to understand and complain
about the rest.

Are you thinking we should restrict pg_get_expr() to superusers then?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#5)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 09/06/10 17:34, Tom Lane wrote:

I think #1 is a fool's errand. There is far too much structure to a
node tree that is outside the scope of what readfuncs.c is capable of
understanding.

That's why I said that ruleutils.c will need to understand and complain
about the rest.

And that's what I'm telling you is a hopeless task.

Are you thinking we should retrict pg_get_expr() to superusers then?

I think that's the only solution that will actually fix the problem,
rather than lead to a never-ending series of security bugs. In
hindsight we should never have exposed that function in that form.

regards, tom lane

#8Kris Jurka
books@ejurka.com
In reply to: Heikki Linnakangas (#5)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On Wed, 9 Jun 2010, Heikki Linnakangas wrote:

Are you thinking we should retrict pg_get_expr() to superusers then?

That seems like it will cause problems for both pg_dump and drivers which
want to return metadata as pg_get_expr has been the recommended way of
fetching this information.

The JDBC driver uses pg_get_expr to decode both pg_attrdef.adbin and
pg_index.indpred.

Kris Jurka

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kris Jurka (#8)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

Kris Jurka <books@ejurka.com> writes:

On Wed, 9 Jun 2010, Heikki Linnakangas wrote:

Are you thinking we should retrict pg_get_expr() to superusers then?

That seems like it will cause problems for both pg_dump and drivers which
want to return metadata as pg_get_expr has been the recommended way of
fetching this information.

Yes, it's not a trivial fix either. We'll have to provide functions or
views that replace the current usages without letting the user insert
untrusted strings.

regards, tom lane

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kris Jurka <books@ejurka.com> writes:

On Wed, 9 Jun 2010, Heikki Linnakangas wrote:

Are you thinking we should retrict pg_get_expr() to superusers then?

That seems like it will cause problems for both pg_dump and drivers which
want to return metadata as pg_get_expr has been the recommended way of
fetching this information.

Yes, it's not a trivial fix either.  We'll have to provide functions or
views that replace the current usages without letting the user insert
untrusted strings.

Maybe I'm all wet here, but don't we need to come up with something we
can back-patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yes, it's not a trivial fix either. �We'll have to provide functions or
views that replace the current usages without letting the user insert
untrusted strings.

Maybe I'm all wet here, but don't we need to come up with something we
can back-patch?

Well, ideally yes, but if it's not actually *secure* then there's no
point --- and I don't believe that the approach of making readfuncs.c
secure against malicious input has the proverbial snowball's chance
of ever being bulletproof.

[ thinks for awhile... ] I wonder whether there is any way of locking
down pg_get_expr so that it throws an error if called with anything
except a suitable field from one of the system catalogs. There are only
a few usage patterns that we need to allow, no? At least in recent PG
versions it is possible for the function to check that its input
expression is a Var. If we had some (probably horridly ugly) way to
obtain the rangetable entry the Var refers to, we could put code into
pg_get_expr to barf if it's not used in a context like
"select pg_get_expr(adbin) from pg_attrdef".

regards, tom lane

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On Wed, Jun 9, 2010 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yes, it's not a trivial fix either.  We'll have to provide functions or
views that replace the current usages without letting the user insert
untrusted strings.

Maybe I'm all wet here, but don't we need to come up with something we
can back-patch?

Well, ideally yes, but if it's not actually *secure* then there's no
point --- and I don't believe that the approach of making readfuncs.c
secure against malicious input has the proverbial snowball's chance
of ever being bulletproof.

I don't really see how it could be *impossible* to securely parse text
input. It's certainly possible not to crash on trivially malformed
input. Completely validating the input MIGHT cost more in performance
than we want to pay in CPU cycles, but I guess I'm not seeing why it
would be an unsolvable problem apart from that.

[ thinks for awhile... ]  I wonder whether there is any way of locking
down pg_get_expr so that it throws an error if called with anything
except a suitable field from one of the system catalogs.  There are only
a few usage patterns that we need to allow, no?  At least in recent PG
versions it is possible for the function to check that its input
expression is a Var.  If we had some (probably horridly ugly) way to
obtain the rangetable entry the Var refers to, we could put code into
pg_get_expr to barf if it's not used in a context like
"select pg_get_expr(adbin) from pg_attrdef".

That's sort of clever... in a really ugly sort of way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

I wrote:

[ thinks for awhile... ] I wonder whether there is any way of locking
down pg_get_expr so that it throws an error if called with anything
except a suitable field from one of the system catalogs.

I did a bit of research into this idea. It looks at least somewhat
feasible:

* All PG versions back to 7.4 will pass the calling expression tree via
fcinfo->flinfo->fn_expr. Lack of that would be a showstopper, because
we can't change the FmgrInfo struct in back branches (ABI break). So we
can get the arguments and determine whether they are Vars.

* To determine which table a Var actually refers to, we must have access
to the rangetable, and in join cases we also need access to the plan
tree node containing the Var (since we have to drill down to the plan
node's inputs to resolve OUTER and INNER references). The rangetable is
reachable from the PlanState node, so it's sufficient to make that one
pointer available to functions. The obvious way to handle this is to
add a field to FmgrInfo, and I would suggest doing it that way as of
9.0. In the back branches, we could probably hack it without an ABI
break by having fn_expr point to some intermediate node type that
contains both the actual expression tree and the PlanState pointer
(probably, we'd make it point to FuncExprState instead of directly to
the FuncExpr, and then add the field to FuncExprState, which is far less
dangerous than redefining struct FmgrInfo). Now this depends on the
assumption that no external functions are doing anything with fn_expr
except passing it to the related fmgr.c routines, which we would teach
about this convention.

* Once we've got the above data it's a simple matter of adapting the
Var-interpretation logic used by EXPLAIN in order to find out the table
OID and column number, if any, represented by the Var. I'd suggest
adding such functions in fmgr.c to extend the API currently offered by
get_fn_expr_argtype() and friends. It seems possible that other
functions besides pg_get_expr might have use for such a capability.

What I'm suggesting is definitely not a trivial patch, and I would never
consider back-patching it if it weren't a security matter. But I think
it's doable and we'd be fixing the hole with a determinate amount of
work, whereas trying to verify the validity of pg_get_expr input
directly would be a never-ending source of more security bugs.

Comments?

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 9, 2010 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, ideally yes, but if it's not actually *secure* then there's no
point --- and I don't believe that the approach of making readfuncs.c
secure against malicious input has the proverbial snowball's chance
of ever being bulletproof.

I don't really see how it could be *impossible* to securely parse text
input. It's certainly possible not to crash on trivially malformed
input.

The operative word in that claim is "trivial". The problem that I see
is that there are many assumptions in the system about the structure and
interrelationships of expression node trees, for instance that certain
List fields contain only certain node types. I don't believe that it's
practical to make the node reading code enforce every one of those
assumptions, or that it'd be maintainable if we did manage to get it
right to start with. Certainly we can make the node reading code do
more checking than it does now, but the odds of making things
bulletproof against malicious input are negligible. I don't want to be
going back to fix another hole every other month for the lifetime of the
project, but that's exactly what we'll be doing if we try to fix it that
way.

regards, tom lane

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#13)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On 10/06/10 00:24, Tom Lane wrote:

I wrote:

[ thinks for awhile... ] I wonder whether there is any way of locking
down pg_get_expr so that it throws an error if called with anything
except a suitable field from one of the system catalogs.

I did a bit of research into this idea. It looks at least somewhat
feasible:

* All PG versions back to 7.4 will pass the calling expression tree via
fcinfo->flinfo->fn_expr. Lack of that would be a showstopper, because
we can't change the FmgrInfo struct in back branches (ABI break). So we
can get the arguments and determine whether they are Vars.

* To determine which table a Var actually refers to, we must have access
to the rangetable, and in join cases we also need access to the plan
tree node containing the Var (since we have to drill down to the plan
node's inputs to resolve OUTER and INNER references). The rangetable is
reachable from the PlanState node, so it's sufficient to make that one
pointer available to functions. The obvious way to handle this is to
add a field to FmgrInfo, and I would suggest doing it that way as of
9.0. In the back branches, we could probably hack it without an ABI
break by having fn_expr point to some intermediate node type that
contains both the actual expression tree and the PlanState pointer
(probably, we'd make it point to FuncExprState instead of directly to
the FuncExpr, and then add the field to FuncExprState, which is far less
dangerous than redefining struct FmgrInfo). Now this depends on the
assumption that no external functions are doing anything with fn_expr
except passing it to the related fmgr.c routines, which we would teach
about this convention.

* Once we've got the above data it's a simple matter of adapting the
Var-interpretation logic used by EXPLAIN in order to find out the table
OID and column number, if any, represented by the Var. I'd suggest
adding such functions in fmgr.c to extend the API currently offered by
get_fn_expr_argtype() and friends. It seems possible that other
functions besides pg_get_expr might have use for such a capability.

What I'm suggesting is definitely not a trivial patch, and I would never
consider back-patching it if it weren't a security matter. But I think
it's doable and we'd be fixing the hole with a determinate amount of
work, whereas trying to verify the validity of pg_get_expr input
directly would be a never-ending source of more security bugs.

Yeah, seems like it would work.

You could avoid changing the meaning of fn_expr by putting the check in
the parse analysis phase, into transformFuncCall(). That would feel
safer at least for back-branches.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#16Florian Pflug
fgp@phlo.org
In reply to: Heikki Linnakangas (#15)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On Jun 15, 2010, at 9:31 , Heikki Linnakangas wrote:

You could avoid changing the meaning of fn_expr by putting the check in the parse analysis phase, into transformFuncCall(). That would feel safer at least for back-branches.

For 9.0, wouldn't a cleaner way to accomplish this be a seperate type for expressions, say pg_expr, instead of storing them as text? With an input function that unconditionally raises and error and no cast to pg_expr, creating new instances of that type would be impossible for normal users. The output function and casts to text would call pg_get_expr() with zero as the second argument.

The internal representation wouldn't change, it's just the type's OID in the catalog that'd be different.

best regards,
Florian Pflug

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#15)
1 attachment(s)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On 15/06/10 10:31, Heikki Linnakangas wrote:

You could avoid changing the meaning of fn_expr by putting the check in
the parse analysis phase, into transformFuncCall(). That would feel
safer at least for back-branches.

Here's a patch using that approach.

I grepped through PostgreSQL and pgadmin source code to find the system
columns where valid node-strings are stored:

pg_index.indexprs
pg_index.indprep
pg_attrdef.adbin
pg_proc.proargdefaults
pg_constraint.conbin

Am I missing anything?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

restrict-pg_get_expr-1.patchtext/x-diff; name=restrict-pg_get_expr-1.patchDownload
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 5e60374..7c375a9 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -16,6 +16,9 @@
 #include "postgres.h"
 
 #include "catalog/pg_type.h"
+#include "catalog/pg_attrdef.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_proc.h"
 #include "commands/dbcommands.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -30,6 +33,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/xml.h"
 
@@ -1210,6 +1214,7 @@ transformFuncCall(ParseState *pstate, FuncCall *fn)
 {
 	List	   *targs;
 	ListCell   *args;
+	Node	   *result;
 
 	/* Transform the list of arguments ... */
 	targs = NIL;
@@ -1220,7 +1225,7 @@ transformFuncCall(ParseState *pstate, FuncCall *fn)
 	}
 
 	/* ... and hand off to ParseFuncOrColumn */
-	return ParseFuncOrColumn(pstate,
+	result = ParseFuncOrColumn(pstate,
 							 fn->funcname,
 							 targs,
 							 fn->agg_order,
@@ -1230,6 +1235,58 @@ transformFuncCall(ParseState *pstate, FuncCall *fn)
 							 fn->over,
 							 false,
 							 fn->location);
+
+	/* FIXME explain why this hack is needed */
+	if (result && IsA(result, FuncExpr) && !superuser())
+	{
+		FuncExpr *fe = (FuncExpr *) result;
+		if (fe->funcid == F_PG_GET_EXPR || fe->funcid == F_PG_GET_EXPR_EXT)
+		{
+			Expr *arg = linitial(fe->args);
+			bool allowed = false;
+
+			/*
+			 * Check that the argument came directly from one of the
+			 * allowed system catalog columns
+			 */
+			if (IsA(arg, Var))
+			{
+				Var *var = (Var *) arg;
+				RangeTblEntry *rte;
+
+				rte = GetRTEByRangeTablePosn(pstate,
+											 var->varno, var->varlevelsup);
+
+				switch(rte->relid)
+				{
+					case IndexRelationId:
+						if (var->varattno == Anum_pg_index_indexprs ||
+							var->varattno == Anum_pg_index_indpred)
+							allowed = true;
+						break;
+
+					case AttrDefaultRelationId:
+						if (var->varattno == Anum_pg_attrdef_adbin)
+							allowed = true;
+						break;
+
+					case ProcedureRelationId:
+						if (var->varattno == Anum_pg_proc_proargdefaults)
+							allowed = true;
+						break;
+					case ConstraintRelationId:
+						if (var->varattno == Anum_pg_constraint_conbin)
+							allowed = true;
+						break;
+				}
+			}
+			if (!allowed)
+				ereport(ERROR,
+						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+						 errmsg("argument to pg_get_expr() must come from system catalogs")));
+		}
+	}
+	return result;
 }
 
 static Node *
diff --git a/src/backend/tcop/fastpath.c b/src/backend/tcop/fastpath.c
index 575fa86..32c4fa9 100644
--- a/src/backend/tcop/fastpath.c
+++ b/src/backend/tcop/fastpath.c
@@ -29,6 +29,7 @@
 #include "tcop/fastpath.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -347,6 +348,11 @@ HandleFunctionRequest(StringInfo msgBuf)
 		aclcheck_error(aclresult, ACL_KIND_PROC,
 					   get_func_name(fid));
 
+	if ((fid == F_PG_GET_EXPR || fid == F_PG_GET_EXPR_EXT) && !superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("argument to pg_get_expr() must come from system catalogs")));
+
 	/*
 	 * Prepare function call info block and insert arguments.
 	 */
#18Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#17)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On Mon, Jun 21, 2010 at 7:50 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 15/06/10 10:31, Heikki Linnakangas wrote:

You could avoid changing the meaning of fn_expr by putting the check in
the parse analysis phase, into transformFuncCall(). That would feel
safer at least for back-branches.

Here's a patch using that approach.

I grepped through PostgreSQL and pgadmin source code to find the system
columns where valid node-strings are stored:

pg_index.indexprs
pg_index.indprep
pg_attrdef.adbin
pg_proc.proargdefaults
pg_constraint.conbin

Am I missing anything?

I think that pg_type.typdefaultbin is used by pg_dump.
pg_rewrite.ev_qual, pg_rewrite.ev_action, pg_trigger.tgqual also
contain nodeToString() output but I didn't have any luck using them
with pg_get_expr() so maybe they don't need to be included.

The only other thing I notice is that, obviously, the FIXME comment
needs to be FIXMEd before commit.

I'd still be in favor of inserting at least some basic error checks
into readfuncs.c, though just in HEAD. The restrictions implemented
here seem adequate to prevent a security vulnerability, but superusers
can still invoke those functions manually, and while superusers can
clearly crash the system in any number of ways, that doesn't seem (to
me) like an adequate justification for ignoring the return value of
strtok(). YMMV, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Florian Pflug (#16)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On 15/06/10 15:19, Florian Pflug wrote:

On Jun 15, 2010, at 9:31 , Heikki Linnakangas wrote:

You could avoid changing the meaning of fn_expr by putting the check in the parse analysis phase, into transformFuncCall(). That would feel safer at least for back-branches.

For 9.0, wouldn't a cleaner way to accomplish this be a seperate type for expressions, say pg_expr, instead of storing them as text? With an input function that unconditionally raises and error and no cast to pg_expr, creating new instances of that type would be impossible for normal users. The output function and casts to text would call pg_get_expr() with zero as the second argument.

The internal representation wouldn't change, it's just the type's OID in the catalog that'd be different.

Yeah, that would be quite elegant. I think it's too late for 9.0, but
something to consider in the future.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#18)
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()

On 23/06/10 21:36, Robert Haas wrote:

On Mon, Jun 21, 2010 at 7:50 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 15/06/10 10:31, Heikki Linnakangas wrote:

You could avoid changing the meaning of fn_expr by putting the check in
the parse analysis phase, into transformFuncCall(). That would feel
safer at least for back-branches.

Here's a patch using that approach.

I grepped through PostgreSQL and pgadmin source code to find the system
columns where valid node-strings are stored:

pg_index.indexprs
pg_index.indprep
pg_attrdef.adbin
pg_proc.proargdefaults
pg_constraint.conbin

Am I missing anything?

I think that pg_type.typdefaultbin is used by pg_dump.

Yep, added that.

pg_rewrite.ev_qual, pg_rewrite.ev_action, pg_trigger.tgqual also
contain nodeToString() output but I didn't have any luck using them
with pg_get_expr() so maybe they don't need to be included.

I left them out.

The only other thing I notice is that, obviously, the FIXME comment
needs to be FIXMEd before commit.

Fixed.

I'd still be in favor of inserting at least some basic error checks
into readfuncs.c, though just in HEAD. The restrictions implemented
here seem adequate to prevent a security vulnerability, but superusers
can still invoke those functions manually, and while superusers can
clearly crash the system in any number of ways, that doesn't seem (to
me) like an adequate justification for ignoring the return value of
strtok(). YMMV, of course.

Agreed. I'll do that as a separate patch.

Thanks for the review!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com