MAP syntax for arrays

Started by Ildar Musinover 7 years ago16 messages

i.musin@postgrespro.ru

over 7 years ago

1 attachment(s)

Hello hackers,

Recently I was working with sql arrays in postgres and it turned out
that postgres doesn't have such very convinient functional constructions
as map, reduce and filter. Currently to map function over array user has
to make a subquery like:

select u.* from
my_table,
lateral (
select array_agg(lower(elem))
from unnest(arr) as elem
) as u;

Which is not only inconvenient but not very efficient as well (see
'Demo' section below).

When I dug into the code I found that postgres already has the needed
infrastructure for implementing map for arrays; actually array coercing
already works that way (it basically maps cast function).

In the attached patch there is a simple map implementation which
introduces new expression type and syntax:

MAP(<func_name> OVER <array_expression>)

For example:

SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
?column?
-----------------
{ONE,TWO,THREE}
(1 row)

This is probably not the most useful notation and it would be better to
have syntax for mapping arbitrary expressions over array, not just
function. I'm struggling to come up with a good idea of how it should
look like. It could look something like following:

MAP(<expr> FOR <placeholder> IN <array_expressin>)

For instance:

SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);

Looking forward for community's suggestions!

Demo
----

Here is a small comparison between map and unnest/aggregate ways for
per-element processing of arrays. Given a table with 1K rows which
contains single column of text[] type. Each array contains 5/10/100
elements.

create table my_table (arr text[]);
insert into my_table
select array_agg(md5(random()::text))
from generate_series(1, 1000) as rows,
generate_series(1, 10) as elements
group by rows;

There are two scripts for pgbench. One for 'map' syntax:

select map(upper over arr) from my_table;

And one for unnest/aggregate:

select u.* from my_table,
lateral (
select array_agg(upper(elem))
from unnest(arr) as elem
) as u;

Results are:

elements per array | map (tps) | unnest/aggregate (tps)
--------------------+------------+------------------------
5 | 139.105359 | 74.434010
10 | 74.089743 | 43.622554
100 | 7.693000 | 5.325805

Apparently map is more efficient for small arrays. And as the size of
array increases the difference decreases.

I'll be glad to any input from the community. Thanks!

--
Ildar Musin
i.musin@postgrespro.ru

Attachments:

map_v1.patchtext/x-diff; name=map_v1.patchDownload

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e284fd7..85d76b2 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -886,6 +886,56 @@ ExecInitExprRec(Expr *node, ExprState *state,
 				break;
 			}
 
+		case T_MapExpr:
+			{
+				MapExpr	   *map = (MapExpr *) node;
+				ExprState  *elemstate;
+				Oid			resultelemtype;
+
+				ExecInitExprRec(map->arrexpr, state, resv, resnull);
+
+				resultelemtype = get_element_type(map->resulttype);
+				if (!OidIsValid(resultelemtype))
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("target type is not an array")));
+
+				/* Construct a sub-expression for the per-element expression */
+				elemstate = makeNode(ExprState);
+				elemstate->expr = map->elemexpr;
+				elemstate->parent = state->parent;
+				elemstate->ext_params = state->ext_params;
+				elemstate->innermost_caseval = (Datum *) palloc(sizeof(Datum));
+				elemstate->innermost_casenull = (bool *) palloc(sizeof(bool));
+
+				ExecInitExprRec(map->elemexpr, elemstate,
+								&elemstate->resvalue, &elemstate->resnull);
+
+				/* Append a DONE step and ready the subexpression */
+				scratch.opcode = EEOP_DONE;
+				ExprEvalPushStep(elemstate, &scratch);
+				ExecReadyExpr(elemstate);
+
+				scratch.opcode = EEOP_MAP;
+				scratch.d.map.elemexprstate = elemstate;
+				scratch.d.map.resultelemtype = resultelemtype;
+
+				if (elemstate)
+				{
+					/* Set up workspace for array_map */
+					scratch.d.map.amstate =
+						(ArrayMapState *) palloc0(sizeof(ArrayMapState));
+				}
+				else
+				{
+					/* Don't need workspace if there's no subexpression */
+					scratch.d.map.amstate = NULL;
+				}
+
+				ExprEvalPushStep(state, &scratch);
+				break;
+			}
+
 		case T_OpExpr:
 			{
 				OpExpr	   *op = (OpExpr *) node;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9d6e25a..b2cbc45 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -328,6 +328,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_FUNCEXPR_STRICT,
 		&&CASE_EEOP_FUNCEXPR_FUSAGE,
 		&&CASE_EEOP_FUNCEXPR_STRICT_FUSAGE,
+		&&CASE_EEOP_MAP,
 		&&CASE_EEOP_BOOL_AND_STEP_FIRST,
 		&&CASE_EEOP_BOOL_AND_STEP,
 		&&CASE_EEOP_BOOL_AND_STEP_LAST,
@@ -699,6 +700,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_MAP)
+		{
+			ExecEvalMapExpr(state, op, econtext);
+
+			EEO_NEXT();
+		}
+
 		/*
 		 * If any of its clauses is FALSE, an AND's result is FALSE regardless
 		 * of the states of the rest of the clauses, so we can stop evaluating
@@ -2230,6 +2238,33 @@ ExecEvalFuncExprStrictFusage(ExprState *state, ExprEvalStep *op,
 }
 
 /*
+ * Evaluate a MapExpr expression.
+ *
+ * Source array is in step's result variable.
+ */
+void
+ExecEvalMapExpr(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
+{
+	Datum		arraydatum;
+
+	/* NULL array -> NULL result */
+	if (*op->resnull)
+		return;
+
+	arraydatum = *op->resvalue;
+	Assert(op->d.map.elemexprstate != NULL);
+
+	/*
+	 * Use array_map to apply the sub-expression to each array element.
+	 */
+	*op->resvalue = array_map(arraydatum,
+							  op->d.map.elemexprstate,
+							  econtext,
+							  op->d.map.resultelemtype,
+							  op->d.map.amstate);
+}
+
+/*
  * Evaluate a PARAM_EXEC parameter.
  *
  * PARAM_EXEC params (internal executor parameters) are stored in the
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c045a7..92b65d2 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2621,6 +2621,31 @@ _copyFuncCall(const FuncCall *from)
 	return newnode;
 }
 
+static A_MapExpr *
+_copyAMapExpr(const A_MapExpr *from)
+{
+	A_MapExpr   *newnode = makeNode(A_MapExpr);
+
+	COPY_NODE_FIELD(funcname);
+	COPY_NODE_FIELD(arrexpr);
+	COPY_LOCATION_FIELD(location);
+
+	return newnode;
+}
+
+static MapExpr *
+_copyMapExpr(const MapExpr *from)
+{
+	MapExpr   *newnode = makeNode(MapExpr);
+
+	COPY_NODE_FIELD(elemexpr);
+	COPY_NODE_FIELD(arrexpr);
+	COPY_SCALAR_FIELD(resulttype);
+	COPY_SCALAR_FIELD(resultcollid);
+
+	return newnode;
+}
+
 static A_Star *
 _copyAStar(const A_Star *from)
 {
@@ -5496,6 +5521,12 @@ copyObjectImpl(const void *from)
 		case T_FuncCall:
 			retval = _copyFuncCall(from);
 			break;
+		case T_A_MapExpr:
+			retval = _copyAMapExpr(from);
+			break;
+		case T_MapExpr:
+			retval = _copyMapExpr(from);
+			break;
 		case T_A_Star:
 			retval = _copyAStar(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6a971d0..ba30d39 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2350,6 +2350,27 @@ _equalFuncCall(const FuncCall *a, const FuncCall *b)
 }
 
 static bool
+_equalAMapExpr(const A_MapExpr *a, const A_MapExpr *b)
+{
+	COMPARE_NODE_FIELD(funcname);
+	COMPARE_NODE_FIELD(arrexpr);
+	COMPARE_LOCATION_FIELD(location);
+
+	return true;
+}
+
+static bool
+_equalMapExpr(const MapExpr *a, const MapExpr *b)
+{
+	COMPARE_NODE_FIELD(elemexpr);
+	COMPARE_NODE_FIELD(arrexpr);
+	COMPARE_SCALAR_FIELD(resulttype);
+	COMPARE_SCALAR_FIELD(resultcollid);
+
+	return true;
+}
+
+static bool
 _equalAStar(const A_Star *a, const A_Star *b)
 {
 	return true;
@@ -3573,6 +3594,12 @@ equal(const void *a, const void *b)
 		case T_FuncCall:
 			retval = _equalFuncCall(a, b);
 			break;
+		case T_A_MapExpr:
+			retval = _equalAMapExpr(a, b);
+			break;
+		case T_MapExpr:
+			retval = _equalMapExpr(a, b);
+			break;
 		case T_A_Star:
 			retval = _equalAStar(a, b);
 			break;
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 1bd2599..dfdd21b 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -600,6 +600,20 @@ makeFuncCall(List *name, List *args, int location)
 }
 
 /*
+ * makeMapExpr
+ *
+ */
+A_MapExpr *
+makeAMapExpr(List *funcname, Node *arrexpr)
+{
+	A_MapExpr *n = makeNode(A_MapExpr);
+
+	n->funcname = funcname;
+	n->arrexpr = arrexpr;
+	return n;
+}
+
+/*
  * makeGroupingSet
  *
  */
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a10014f..65bc6b1 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -80,6 +80,8 @@ exprType(const Node *expr)
 		case T_FuncExpr:
 			type = ((const FuncExpr *) expr)->funcresulttype;
 			break;
+		case T_MapExpr:
+			return ((MapExpr *) expr)->resulttype;
 		case T_NamedArgExpr:
 			type = exprType((Node *) ((const NamedArgExpr *) expr)->arg);
 			break;
@@ -750,6 +752,9 @@ exprCollation(const Node *expr)
 		case T_FuncExpr:
 			coll = ((const FuncExpr *) expr)->funccollid;
 			break;
+		case T_MapExpr:
+			coll = ((const MapExpr *) expr)->resultcollid;
+			break;
 		case T_NamedArgExpr:
 			coll = exprCollation((Node *) ((const NamedArgExpr *) expr)->arg);
 			break;
@@ -994,6 +999,9 @@ exprSetCollation(Node *expr, Oid collation)
 		case T_FuncExpr:
 			((FuncExpr *) expr)->funccollid = collation;
 			break;
+		case T_MapExpr:
+			((MapExpr *) expr)->resultcollid = collation;
+			break;
 		case T_NamedArgExpr:
 			Assert(collation == exprCollation((Node *) ((NamedArgExpr *) expr)->arg));
 			break;
@@ -1132,6 +1140,10 @@ exprSetInputCollation(Node *expr, Oid inputcollation)
 		case T_FuncExpr:
 			((FuncExpr *) expr)->inputcollid = inputcollation;
 			break;
+		case T_MapExpr:
+			exprSetInputCollation((Node *) ((MapExpr *) expr)->elemexpr,
+								  inputcollation);
+			break;
 		case T_OpExpr:
 			((OpExpr *) expr)->inputcollid = inputcollation;
 			break;
@@ -1937,6 +1949,16 @@ expression_tree_walker(Node *node,
 					return true;
 			}
 			break;
+		case T_MapExpr:
+			{
+				MapExpr	   *map = (MapExpr *) node;
+
+				if (walker((Node *) map->arrexpr, context))
+					return true;
+				if (walker((Node *) map->elemexpr, context))
+					return true;
+			}
+			break;
 		case T_NamedArgExpr:
 			return walker(((NamedArgExpr *) node)->arg, context);
 		case T_OpExpr:
@@ -2566,6 +2588,15 @@ expression_tree_mutator(Node *node,
 				return (Node *) newnode;
 			}
 			break;
+		case T_MapExpr:
+			{
+				MapExpr	   *expr = (MapExpr *) node;
+				MapExpr	   *newnode;
+
+				FLATCOPY(newnode, expr, MapExpr);
+				MUTATE(newnode->arrexpr, expr->arrexpr, Expr *);
+				return (Node *) newnode;
+			}
 		case T_NamedArgExpr:
 			{
 				NamedArgExpr *nexpr = (NamedArgExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1da9d7e..3f6578f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2793,6 +2793,27 @@ _outFuncCall(StringInfo str, const FuncCall *node)
 }
 
 static void
+_outAMapExpr(StringInfo str, const A_MapExpr *node)
+{
+	WRITE_NODE_TYPE("A_MAPEXPR");
+
+	WRITE_NODE_FIELD(funcname);
+	WRITE_NODE_FIELD(arrexpr);
+	WRITE_LOCATION_FIELD(location);
+}
+
+static void
+_outMapExpr(StringInfo str, const MapExpr *node)
+{
+	WRITE_NODE_TYPE("MAPEXPR");
+
+	WRITE_NODE_FIELD(elemexpr);
+	WRITE_NODE_FIELD(arrexpr);
+	WRITE_OID_FIELD(resulttype);
+	WRITE_OID_FIELD(resultcollid);
+}
+
+static void
 _outDefElem(StringInfo str, const DefElem *node)
 {
 	WRITE_NODE_TYPE("DEFELEM");
@@ -4278,6 +4299,12 @@ outNode(StringInfo str, const void *obj)
 			case T_FuncCall:
 				_outFuncCall(str, obj);
 				break;
+			case T_A_MapExpr:
+				_outAMapExpr(str, obj);
+				break;
+			case T_MapExpr:
+				_outMapExpr(str, obj);
+				break;
 			case T_DefElem:
 				_outDefElem(str, obj);
 				break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index babb62d..fe3027b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -563,6 +563,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <node>	func_application func_expr_common_subexpr
 %type <node>	func_expr func_expr_windowless
+%type <node>	map_expr
 %type <node>	common_table_expr
 %type <with>	with_clause opt_with_clause
 %type <list>	cte_list
@@ -652,7 +653,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
 	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
+	MAP MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
 	NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NO NONE
 	NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -13461,6 +13462,8 @@ c_expr:		columnref								{ $$ = $1; }
 				}
 			| case_expr
 				{ $$ = $1; }
+			| map_expr
+				{ $$ = $1; }
 			| func_expr
 				{ $$ = $1; }
 			| select_with_parens			%prec UMINUS
@@ -13556,6 +13559,14 @@ c_expr:		columnref								{ $$ = $1; }
 			  }
 		;
 
+map_expr:	MAP '(' type_function_name OVER a_expr ')'
+				{
+					A_MapExpr *m = makeAMapExpr(list_make1(makeString($3)), $5);
+					m->location = @1;
+					$$ = (Node *)m;
+				}
+		;
+
 func_application: func_name '(' ')'
 				{
 					$$ = (Node *) makeFuncCall($1, NIL, @1);
@@ -15423,6 +15434,7 @@ reserved_keyword:
 			| LIMIT
 			| LOCALTIME
 			| LOCALTIMESTAMP
+			| MAP
 			| NOT
 			| NULL_P
 			| OFFSET
diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c
index 6d34245..0021b24 100644
--- a/src/backend/parser/parse_collate.c
+++ b/src/backend/parser/parse_collate.c
@@ -667,6 +667,15 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 															&loccontext);
 						}
 						break;
+					case T_MapExpr:
+						{
+							MapExpr	   *map = (MapExpr *) node;
+
+							(void) expression_tree_walker((Node *) map->elemexpr,
+														  assign_collations_walker,
+														  (void *) &loccontext);
+						}
+						break;
 					default:
 
 						/*
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 385e54a..64b7ea5 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -101,6 +101,7 @@ static Node *transformAExprIn(ParseState *pstate, A_Expr *a);
 static Node *transformAExprBetween(ParseState *pstate, A_Expr *a);
 static Node *transformBoolExpr(ParseState *pstate, BoolExpr *a);
 static Node *transformFuncCall(ParseState *pstate, FuncCall *fn);
+static Node *transformMapExpr(ParseState *pstate, A_MapExpr *a_mapexpr);
 static Node *transformMultiAssignRef(ParseState *pstate, MultiAssignRef *maref);
 static Node *transformCaseExpr(ParseState *pstate, CaseExpr *c);
 static Node *transformSubLink(ParseState *pstate, SubLink *sublink);
@@ -258,6 +259,10 @@ transformExprRecurse(ParseState *pstate, Node *expr)
 				break;
 			}
 
+		case T_A_MapExpr:
+			result = transformMapExpr(pstate, (A_MapExpr *) expr);
+			break;
+
 		case T_BoolExpr:
 			result = transformBoolExpr(pstate, (BoolExpr *) expr);
 			break;
@@ -1486,6 +1491,55 @@ transformFuncCall(ParseState *pstate, FuncCall *fn)
 }
 
 static Node *
+transformMapExpr(ParseState *pstate, A_MapExpr *a_mapexpr)
+{
+	MapExpr	   *map;
+	CaseTestExpr *placeholder;
+	Node	   *arrexpr;
+	Oid			funcId;
+	Oid			sourceElemType;
+	Oid			targetElemType;
+	FuncExpr   *elemexpr;
+	Oid			collation;
+
+	arrexpr = transformExprRecurse(pstate, a_mapexpr->arrexpr);
+
+	sourceElemType = get_element_type(exprType(arrexpr));
+	if (sourceElemType == InvalidOid)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("array expression is expected"),
+				 parser_errposition(pstate, exprLocation(a_mapexpr->arrexpr))));
+
+	collation = IsA(arrexpr, ArrayExpr) ?
+		get_typcollation(sourceElemType) :
+		exprCollation(arrexpr);
+
+	/* Create placeholder for per-element expression */
+	placeholder = makeNode(CaseTestExpr);
+	placeholder->typeId = sourceElemType;
+	placeholder->typeMod = exprTypmod(arrexpr);
+	placeholder->collation = collation;
+
+	/* Build per-element expression */
+	funcId = LookupFuncName(a_mapexpr->funcname, 1, &sourceElemType, false);
+	targetElemType = get_func_rettype(funcId);
+	elemexpr = makeFuncExpr(funcId,
+							targetElemType,
+							list_make1(placeholder),
+							InvalidOid,
+							InvalidOid,  /* will be set by parse_collate.c */
+							COERCE_EXPLICIT_CALL);
+
+	map = makeNode(MapExpr);
+	map->arrexpr = (Expr *) arrexpr;
+	map->elemexpr = (Expr *) elemexpr;
+	map->resulttype = get_array_type(targetElemType);
+
+	return (Node *) map;
+}
+
+static Node *
 transformMultiAssignRef(ParseState *pstate, MultiAssignRef *maref)
 {
 	SubLink    *sublink;
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index f7b1f77..ddad247 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -92,6 +92,11 @@ typedef enum ExprEvalOp
 	EEOP_FUNCEXPR_STRICT_FUSAGE,
 
 	/*
+	 * Evaluate map expression
+	 */
+	EEOP_MAP,
+
+	/*
 	 * Evaluate boolean AND expression, one step per subexpression. FIRST/LAST
 	 * subexpressions are special-cased for performance.  Since AND always has
 	 * at least two subexpressions, FIRST and LAST never apply to the same
@@ -318,6 +323,13 @@ typedef struct ExprEvalStep
 			int			nargs;	/* number of arguments */
 		}			func;
 
+		struct
+		{
+			ExprState  *elemexprstate;	/* null if no per-element work */
+			Oid			resultelemtype; /* element type of result array */
+			struct ArrayMapState *amstate;	/* workspace for array_map */
+		}			map;
+
 		/* for EEOP_BOOL_*_STEP */
 		struct
 		{
@@ -695,6 +707,8 @@ extern void ExecEvalFuncExprFusage(ExprState *state, ExprEvalStep *op,
 					   ExprContext *econtext);
 extern void ExecEvalFuncExprStrictFusage(ExprState *state, ExprEvalStep *op,
 							 ExprContext *econtext);
+extern void ExecEvalMapExpr(ExprState *state, ExprEvalStep *op,
+				  ExprContext *econtext);
 extern void ExecEvalParamExec(ExprState *state, ExprEvalStep *op,
 				  ExprContext *econtext);
 extern void ExecEvalParamExecParams(Bitmapset *params, EState *estate);
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 57bd52f..7dc3937 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -79,6 +79,7 @@ extern FuncExpr *makeFuncExpr(Oid funcid, Oid rettype, List *args,
 			 Oid funccollid, Oid inputcollid, CoercionForm fformat);
 
 extern FuncCall *makeFuncCall(List *name, List *args, int location);
+extern A_MapExpr * makeAMapExpr(List *funcname, Node *arrexpr);
 
 extern DefElem *makeDefElem(char *name, Node *arg, int location);
 extern DefElem *makeDefElemExtended(char *nameSpace, char *name, Node *arg,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adb159a..d34abc2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -428,6 +428,8 @@ typedef enum NodeTag
 	T_ParamRef,
 	T_A_Const,
 	T_FuncCall,
+	T_A_MapExpr,
+	T_MapExpr,
 	T_A_Star,
 	T_A_Indices,
 	T_A_Indirection,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 6390f7e..9c185c7 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -418,6 +418,17 @@ typedef struct A_ArrayExpr
 } A_ArrayExpr;
 
 /*
+ * A_MapExpr - array map expression
+ */
+typedef struct A_MapExpr
+{
+	NodeTag		type;
+	List	   *funcname;
+	Node	   *arrexpr;
+	int			location;		/* token location, or -1 if unknown */
+} A_MapExpr;
+
+/*
  * ResTarget -
  *	  result target (used in target list of pre-transformed parse trees)
  *
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f90aa7b..f108898 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -460,6 +460,18 @@ typedef struct FuncExpr
 } FuncExpr;
 
 /*
+ * MapExpr - array map expression
+ */
+typedef struct MapExpr
+{
+	NodeTag		type;
+	Expr	   *elemexpr;		/* expression representing per-element work */
+	Expr	   *arrexpr;		/* source expression of array type */
+	Oid			resulttype;		/* OID of target array type */
+	Oid			resultcollid;	/* OID of collation, or InvalidOid if none */
+} MapExpr;
+
+/*
  * NamedArgExpr - a named argument of a function
  *
  * This node type can only appear in the args list of a FuncCall or FuncExpr
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 23db401..d6c5254 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("map", MAP, RESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 7 years ago

In reply to: Ildar Musin (#1)

Re: MAP syntax for arrays

On Fri, May 4, 2018 at 6:38 PM, Ildar Musin <i.musin@postgrespro.ru> wrote:

Hello hackers,

Recently I was working with sql arrays in postgres and it turned out
that postgres doesn't have such very convinient functional constructions
as map, reduce and filter. Currently to map function over array user has
to make a subquery like:

select u.* from
my_table,
lateral (
select array_agg(lower(elem))
from unnest(arr) as elem
) as u;

Which is not only inconvenient but not very efficient as well (see
'Demo' section below).

Is there a way we can improve unnest() and array_agg() to match the
performance you have specified by let's say optimizing the cases
specially when those two are used together. Identifying that may be
some work, but will not require introducing new syntax.

When I dug into the code I found that postgres already has the needed
infrastructure for implementing map for arrays; actually array coercing
already works that way (it basically maps cast function).

In the attached patch there is a simple map implementation which
introduces new expression type and syntax:

MAP(<func_name> OVER <array_expression>)

For example:

SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
?column?
-----------------
{ONE,TWO,THREE}
(1 row)

This is probably not the most useful notation and it would be better to
have syntax for mapping arbitrary expressions over array, not just
function. I'm struggling to come up with a good idea of how it should
look like. It could look something like following:

MAP(<expr> FOR <placeholder> IN <array_expressin>)

For instance:

SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);

Looking forward for community's suggestions!

What if the expression has more than one variable, each mapping to a
different array? What if the arrays have different lengths or worse
different dimensions? This looks like the way SRFs used to work.

Instead of introducing a new syntax, is it possible to detect that
argument to a function is an array of the same type as the argument
and apply MAP automatically? In your example, upper(arr) would detect
that the input is an array of the same type as the scalar argument
type and do array_agg(upper(arr[id1], arr[id2], ...).

elements per array | map (tps) | unnest/aggregate (tps)
--------------------+------------+------------------------
5 | 139.105359 | 74.434010
10 | 74.089743 | 43.622554
100 | 7.693000 | 5.325805

Apparently map is more efficient for small arrays. And as the size of
array increases the difference decreases.

I am afraid that the way difference is diminishing with increase in
the number of elements, unnest, array_agg combination might win for
large number of elements. Have you tried that?
If we try to improve unnest, array_agg combination for small array, we
will get consistent performance without any additional syntax.
Although, I admit that query involving unnest and array_agg is not
very readable.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Ashutosh Bapat (#2)

Re: MAP syntax for arrays

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

Is there a way we can improve unnest() and array_agg() to match the
performance you have specified by let's say optimizing the cases
specially when those two are used together. Identifying that may be
some work, but will not require introducing new syntax.

+1. The first thing I thought on seeing this proposal was "I wonder
how long it will be before the SQL committee introduces some syntax
that uses the MAP keyword and breaks this".

ISTM the planner could be taught to notice the combination of unnest
and array_agg and produce a special output plan from that.

It is, however, fair to wonder whether this is worth our time to
optimize. I've not noticed a lot of people complaining about
performance of this sort of thing, at least not since we fixed
array_agg to not be O(N^2).

regards, tom lane

Ildar Musin

i.musin@postgrespro.ru

over 7 years ago

In reply to: Tom Lane (#3)

1 attachment(s)

Re: MAP syntax for arrays

Hello Tom, Ashutosh,

On 07.05.2018 18:16, Tom Lane wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

Is there a way we can improve unnest() and array_agg() to match
the performance you have specified by let's say optimizing the
cases specially when those two are used together. Identifying that
may be some work, but will not require introducing new syntax.

+1. The first thing I thought on seeing this proposal was "I wonder
how long it will be before the SQL committee introduces some syntax
that uses the MAP keyword and breaks this".

ISTM the planner could be taught to notice the combination of unnest
and array_agg and produce a special output plan from that.

It is, however, fair to wonder whether this is worth our time to
optimize. I've not noticed a lot of people complaining about
performance of this sort of thing, at least not since we fixed
array_agg to not be O(N^2).

The main point of this patch was about convenience; the performance
thing came out later just as a side effect :) Many users are familiar
with "map/reduce/filter" concept that many languages (not only
functional ones) utilized. And my though was that it would be great to
have those in postgres too.

Apparently there is also a case when unnest/array_agg may not produce
the result we're looking for. I mean multidimensional arrays. E.g.

select array_agg(x * 2)
from unnest(array[[1,2],[3,4],[5,6]]) as x;
array_agg
-----------------
{2,4,6,8,10,12}
(1 row)

select map(x * 2 for x in array[[1,2],[3,4],[5,6]]);
?column?
-----------------------
{{2,4},{6,8},{10,12}}
(1 row)

array_agg produces plain arrays no matter what the input was.

There is a new version of the patch in the attachment which introduces
arbitrary per-element expressions (instead of single function call). So
now user can specify a placeholder representing one element of the array
and use it in the expressions. Like following:

select map (pow(x, 2) - 1 for x in array[1,2,3]);
?column?
---------------
{1,3,7,15,31}
(1 row)

--
Ildar Musin
i.musin@postgrespro.ru

Attachments:

map_v2.patchtext/x-diff; name=map_v2.patchDownload

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e284fd7..85d76b2 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -886,6 +886,56 @@ ExecInitExprRec(Expr *node, ExprState *state,
 				break;
 			}
 
+		case T_MapExpr:
+			{
+				MapExpr	   *map = (MapExpr *) node;
+				ExprState  *elemstate;
+				Oid			resultelemtype;
+
+				ExecInitExprRec(map->arrexpr, state, resv, resnull);
+
+				resultelemtype = get_element_type(map->resulttype);
+				if (!OidIsValid(resultelemtype))
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("target type is not an array")));
+
+				/* Construct a sub-expression for the per-element expression */
+				elemstate = makeNode(ExprState);
+				elemstate->expr = map->elemexpr;
+				elemstate->parent = state->parent;
+				elemstate->ext_params = state->ext_params;
+				elemstate->innermost_caseval = (Datum *) palloc(sizeof(Datum));
+				elemstate->innermost_casenull = (bool *) palloc(sizeof(bool));
+
+				ExecInitExprRec(map->elemexpr, elemstate,
+								&elemstate->resvalue, &elemstate->resnull);
+
+				/* Append a DONE step and ready the subexpression */
+				scratch.opcode = EEOP_DONE;
+				ExprEvalPushStep(elemstate, &scratch);
+				ExecReadyExpr(elemstate);
+
+				scratch.opcode = EEOP_MAP;
+				scratch.d.map.elemexprstate = elemstate;
+				scratch.d.map.resultelemtype = resultelemtype;
+
+				if (elemstate)
+				{
+					/* Set up workspace for array_map */
+					scratch.d.map.amstate =
+						(ArrayMapState *) palloc0(sizeof(ArrayMapState));
+				}
+				else
+				{
+					/* Don't need workspace if there's no subexpression */
+					scratch.d.map.amstate = NULL;
+				}
+
+				ExprEvalPushStep(state, &scratch);
+				break;
+			}
+
 		case T_OpExpr:
 			{
 				OpExpr	   *op = (OpExpr *) node;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9d6e25a..b2cbc45 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -328,6 +328,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_FUNCEXPR_STRICT,
 		&&CASE_EEOP_FUNCEXPR_FUSAGE,
 		&&CASE_EEOP_FUNCEXPR_STRICT_FUSAGE,
+		&&CASE_EEOP_MAP,
 		&&CASE_EEOP_BOOL_AND_STEP_FIRST,
 		&&CASE_EEOP_BOOL_AND_STEP,
 		&&CASE_EEOP_BOOL_AND_STEP_LAST,
@@ -699,6 +700,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_MAP)
+		{
+			ExecEvalMapExpr(state, op, econtext);
+
+			EEO_NEXT();
+		}
+
 		/*
 		 * If any of its clauses is FALSE, an AND's result is FALSE regardless
 		 * of the states of the rest of the clauses, so we can stop evaluating
@@ -2230,6 +2238,33 @@ ExecEvalFuncExprStrictFusage(ExprState *state, ExprEvalStep *op,
 }
 
 /*
+ * Evaluate a MapExpr expression.
+ *
+ * Source array is in step's result variable.
+ */
+void
+ExecEvalMapExpr(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
+{
+	Datum		arraydatum;
+
+	/* NULL array -> NULL result */
+	if (*op->resnull)
+		return;
+
+	arraydatum = *op->resvalue;
+	Assert(op->d.map.elemexprstate != NULL);
+
+	/*
+	 * Use array_map to apply the sub-expression to each array element.
+	 */
+	*op->resvalue = array_map(arraydatum,
+							  op->d.map.elemexprstate,
+							  econtext,
+							  op->d.map.resultelemtype,
+							  op->d.map.amstate);
+}
+
+/*
  * Evaluate a PARAM_EXEC parameter.
  *
  * PARAM_EXEC params (internal executor parameters) are stored in the
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c045a7..c877c29 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2621,6 +2621,32 @@ _copyFuncCall(const FuncCall *from)
 	return newnode;
 }
 
+static A_MapExpr *
+_copyAMapExpr(const A_MapExpr *from)
+{
+	A_MapExpr   *newnode = makeNode(A_MapExpr);
+
+	COPY_NODE_FIELD(elemexpr);
+	COPY_STRING_FIELD(placeholder);
+	COPY_NODE_FIELD(arrexpr);
+	COPY_LOCATION_FIELD(location);
+
+	return newnode;
+}
+
+static MapExpr *
+_copyMapExpr(const MapExpr *from)
+{
+	MapExpr   *newnode = makeNode(MapExpr);
+
+	COPY_NODE_FIELD(elemexpr);
+	COPY_NODE_FIELD(arrexpr);
+	COPY_SCALAR_FIELD(resulttype);
+	COPY_SCALAR_FIELD(resultcollid);
+
+	return newnode;
+}
+
 static A_Star *
 _copyAStar(const A_Star *from)
 {
@@ -5496,6 +5522,12 @@ copyObjectImpl(const void *from)
 		case T_FuncCall:
 			retval = _copyFuncCall(from);
 			break;
+		case T_A_MapExpr:
+			retval = _copyAMapExpr(from);
+			break;
+		case T_MapExpr:
+			retval = _copyMapExpr(from);
+			break;
 		case T_A_Star:
 			retval = _copyAStar(from);
 			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6a971d0..18e2555 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2350,6 +2350,28 @@ _equalFuncCall(const FuncCall *a, const FuncCall *b)
 }
 
 static bool
+_equalAMapExpr(const A_MapExpr *a, const A_MapExpr *b)
+{
+	COMPARE_NODE_FIELD(elemexpr);
+	COMPARE_STRING_FIELD(placeholder);
+	COMPARE_NODE_FIELD(arrexpr);
+	COMPARE_LOCATION_FIELD(location);
+
+	return true;
+}
+
+static bool
+_equalMapExpr(const MapExpr *a, const MapExpr *b)
+{
+	COMPARE_NODE_FIELD(elemexpr);
+	COMPARE_NODE_FIELD(arrexpr);
+	COMPARE_SCALAR_FIELD(resulttype);
+	COMPARE_SCALAR_FIELD(resultcollid);
+
+	return true;
+}
+
+static bool
 _equalAStar(const A_Star *a, const A_Star *b)
 {
 	return true;
@@ -3573,6 +3595,12 @@ equal(const void *a, const void *b)
 		case T_FuncCall:
 			retval = _equalFuncCall(a, b);
 			break;
+		case T_A_MapExpr:
+			retval = _equalAMapExpr(a, b);
+			break;
+		case T_MapExpr:
+			retval = _equalMapExpr(a, b);
+			break;
 		case T_A_Star:
 			retval = _equalAStar(a, b);
 			break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a10014f..95a3844 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -80,6 +80,8 @@ exprType(const Node *expr)
 		case T_FuncExpr:
 			type = ((const FuncExpr *) expr)->funcresulttype;
 			break;
+		case T_MapExpr:
+			return ((MapExpr *) expr)->resulttype;
 		case T_NamedArgExpr:
 			type = exprType((Node *) ((const NamedArgExpr *) expr)->arg);
 			break;
@@ -750,6 +752,9 @@ exprCollation(const Node *expr)
 		case T_FuncExpr:
 			coll = ((const FuncExpr *) expr)->funccollid;
 			break;
+		case T_MapExpr:
+			coll = ((const MapExpr *) expr)->resultcollid;
+			break;
 		case T_NamedArgExpr:
 			coll = exprCollation((Node *) ((const NamedArgExpr *) expr)->arg);
 			break;
@@ -994,6 +999,9 @@ exprSetCollation(Node *expr, Oid collation)
 		case T_FuncExpr:
 			((FuncExpr *) expr)->funccollid = collation;
 			break;
+		case T_MapExpr:
+			((MapExpr *) expr)->resultcollid = collation;
+			break;
 		case T_NamedArgExpr:
 			Assert(collation == exprCollation((Node *) ((NamedArgExpr *) expr)->arg));
 			break;
@@ -1132,6 +1140,10 @@ exprSetInputCollation(Node *expr, Oid inputcollation)
 		case T_FuncExpr:
 			((FuncExpr *) expr)->inputcollid = inputcollation;
 			break;
+		case T_MapExpr:
+			exprSetInputCollation((Node *) ((MapExpr *) expr)->elemexpr,
+								  inputcollation);
+			break;
 		case T_OpExpr:
 			((OpExpr *) expr)->inputcollid = inputcollation;
 			break;
@@ -1937,6 +1949,16 @@ expression_tree_walker(Node *node,
 					return true;
 			}
 			break;
+		case T_MapExpr:
+			{
+				MapExpr	   *map = (MapExpr *) node;
+
+				if (walker((Node *) map->arrexpr, context))
+					return true;
+				if (walker((Node *) map->elemexpr, context))
+					return true;
+			}
+			break;
 		case T_NamedArgExpr:
 			return walker(((NamedArgExpr *) node)->arg, context);
 		case T_OpExpr:
@@ -2479,6 +2501,9 @@ expression_tree_mutator(Node *node,
 		case T_NextValueExpr:
 		case T_RangeTblRef:
 		case T_SortGroupClause:
+		case T_ColumnRef:
+		case T_ParamRef:
+		case T_A_Const:
 			return (Node *) copyObject(node);
 		case T_WithCheckOption:
 			{
@@ -2566,6 +2591,15 @@ expression_tree_mutator(Node *node,
 				return (Node *) newnode;
 			}
 			break;
+		case T_MapExpr:
+			{
+				MapExpr	   *expr = (MapExpr *) node;
+				MapExpr	   *newnode;
+
+				FLATCOPY(newnode, expr, MapExpr);
+				MUTATE(newnode->arrexpr, expr->arrexpr, Expr *);
+				return (Node *) newnode;
+			}
 		case T_NamedArgExpr:
 			{
 				NamedArgExpr *nexpr = (NamedArgExpr *) node;
@@ -3060,6 +3094,36 @@ expression_tree_mutator(Node *node,
 				return (Node *) newnode;
 			}
 			break;
+		case T_A_Expr:
+			{
+				A_Expr	   *expr = (A_Expr *) node;
+				A_Expr	   *newnode;
+
+				FLATCOPY(newnode, expr, A_Expr);
+				MUTATE(newnode->lexpr, expr->lexpr, Node *);
+				MUTATE(newnode->rexpr, expr->rexpr, Node *);
+				return (Node *) newnode;
+			}
+			break;
+		case T_FuncCall:
+			{
+				FuncCall   *fc = (FuncCall *) node;
+				FuncCall   *newnode;
+
+				FLATCOPY(newnode, fc, FuncCall);
+				MUTATE(newnode->args, fc->args, List *);
+				return (Node *) newnode;
+			}
+		case T_TypeCast:
+			{
+				TypeCast   *tc = (TypeCast *) node;
+				TypeCast   *newnode;
+
+				FLATCOPY(newnode, tc, TypeCast);
+				MUTATE(newnode->arg, tc->arg, Node *);
+				return (Node *) newnode;
+			}
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1da9d7e..98b6740 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2793,6 +2793,28 @@ _outFuncCall(StringInfo str, const FuncCall *node)
 }
 
 static void
+_outAMapExpr(StringInfo str, const A_MapExpr *node)
+{
+	WRITE_NODE_TYPE("A_MAPEXPR");
+
+	WRITE_NODE_FIELD(elemexpr);
+	WRITE_STRING_FIELD(placeholder);
+	WRITE_NODE_FIELD(arrexpr);
+	WRITE_LOCATION_FIELD(location);
+}
+
+static void
+_outMapExpr(StringInfo str, const MapExpr *node)
+{
+	WRITE_NODE_TYPE("MAPEXPR");
+
+	WRITE_NODE_FIELD(elemexpr);
+	WRITE_NODE_FIELD(arrexpr);
+	WRITE_OID_FIELD(resulttype);
+	WRITE_OID_FIELD(resultcollid);
+}
+
+static void
 _outDefElem(StringInfo str, const DefElem *node)
 {
 	WRITE_NODE_TYPE("DEFELEM");
@@ -4278,6 +4300,12 @@ outNode(StringInfo str, const void *obj)
 			case T_FuncCall:
 				_outFuncCall(str, obj);
 				break;
+			case T_A_MapExpr:
+				_outAMapExpr(str, obj);
+				break;
+			case T_MapExpr:
+				_outMapExpr(str, obj);
+				break;
 			case T_DefElem:
 				_outDefElem(str, obj);
 				break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index babb62d..37a6e1c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -563,6 +563,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type <node>	func_application func_expr_common_subexpr
 %type <node>	func_expr func_expr_windowless
+%type <node>	map_expr
 %type <node>	common_table_expr
 %type <with>	with_clause opt_with_clause
 %type <list>	cte_list
@@ -652,7 +653,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
 	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
+	MAP MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
 	NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NO NONE
 	NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -13461,6 +13462,8 @@ c_expr:		columnref								{ $$ = $1; }
 				}
 			| case_expr
 				{ $$ = $1; }
+			| map_expr
+				{ $$ = $1; }
 			| func_expr
 				{ $$ = $1; }
 			| select_with_parens			%prec UMINUS
@@ -13556,6 +13559,17 @@ c_expr:		columnref								{ $$ = $1; }
 			  }
 		;
 
+map_expr:	MAP '(' a_expr FOR IDENT IN_P a_expr ')'
+				{
+					A_MapExpr *m = makeNode(A_MapExpr);
+					m->elemexpr = $3;
+					m->placeholder = $5;
+					m->arrexpr = $7;
+					m->location = @1;
+					$$ = (Node *)m;
+				}
+		;
+
 func_application: func_name '(' ')'
 				{
 					$$ = (Node *) makeFuncCall($1, NIL, @1);
@@ -15423,6 +15437,7 @@ reserved_keyword:
 			| LIMIT
 			| LOCALTIME
 			| LOCALTIMESTAMP
+			| MAP
 			| NOT
 			| NULL_P
 			| OFFSET
diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c
index 6d34245..0021b24 100644
--- a/src/backend/parser/parse_collate.c
+++ b/src/backend/parser/parse_collate.c
@@ -667,6 +667,15 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 															&loccontext);
 						}
 						break;
+					case T_MapExpr:
+						{
+							MapExpr	   *map = (MapExpr *) node;
+
+							(void) expression_tree_walker((Node *) map->elemexpr,
+														  assign_collations_walker,
+														  (void *) &loccontext);
+						}
+						break;
 					default:
 
 						/*
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 385e54a..18e7310 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -101,6 +101,9 @@ static Node *transformAExprIn(ParseState *pstate, A_Expr *a);
 static Node *transformAExprBetween(ParseState *pstate, A_Expr *a);
 static Node *transformBoolExpr(ParseState *pstate, BoolExpr *a);
 static Node *transformFuncCall(ParseState *pstate, FuncCall *fn);
+static Node *convert_placeholder_mutator(Node *node, 
+							convert_placeholder_context *context);
+static Node *transformMapExpr(ParseState *pstate, A_MapExpr *a_mapexpr);
 static Node *transformMultiAssignRef(ParseState *pstate, MultiAssignRef *maref);
 static Node *transformCaseExpr(ParseState *pstate, CaseExpr *c);
 static Node *transformSubLink(ParseState *pstate, SubLink *sublink);
@@ -135,6 +138,14 @@ static void emit_precedence_warnings(ParseState *pstate,
 						 Node *lchild, Node *rchild,
 						 int location);
 
+/* Context for convert_placeholder_mutator */
+typedef struct
+{
+	char   *placeholder;
+	Oid		typid;
+	Oid		typmod;
+	Oid		collid;
+} convert_placeholder_context;
 
 /*
  * transformExpr -
@@ -258,6 +269,10 @@ transformExprRecurse(ParseState *pstate, Node *expr)
 				break;
 			}
 
+		case T_A_MapExpr:
+			result = transformMapExpr(pstate, (A_MapExpr *) expr);
+			break;
+
 		case T_BoolExpr:
 			result = transformBoolExpr(pstate, (BoolExpr *) expr);
 			break;
@@ -1486,6 +1501,99 @@ transformFuncCall(ParseState *pstate, FuncCall *fn)
 }
 
 static Node *
+convert_placeholder(Node *node, char *placeholder,
+					Oid typid, Oid typmod, Oid collid)
+{
+	convert_placeholder_context context;
+
+	context.placeholder = placeholder;
+	context.typid = typid;
+	context.typmod = typmod;
+	context.collid = collid;
+
+	return convert_placeholder_mutator(node, &context);
+}
+
+static Node *
+convert_placeholder_mutator(Node *node,
+							convert_placeholder_context *context)
+{
+	if (node == NULL)
+		return NULL;
+	if (IsA(node, ColumnRef))
+	{
+		ColumnRef  *cref = (ColumnRef *) node;
+		char	   *refname;
+
+		if (list_length(cref->fields) != 1)
+			return node;
+
+		refname = strVal(linitial(cref->fields));
+
+		if (strcmp(refname, context->placeholder) == 0)
+		{
+			CaseTestExpr *placeholder;
+
+			/* Create placeholder for per-element expression */
+			placeholder = makeNode(CaseTestExpr);
+			placeholder->typeId = context->typid;
+			placeholder->typeMod = context->typmod;
+			placeholder->collation = context->collid;
+
+			return (Node *) placeholder;
+		}
+	}
+
+	return expression_tree_mutator(node,
+								   convert_placeholder_mutator,
+								   (void *) context);
+}
+
+static Node *
+transformMapExpr(ParseState *pstate, A_MapExpr *a_mapexpr)
+{
+	MapExpr	   *map;
+	CaseTestExpr *placeholder;
+	Node	   *arrexpr;
+	Oid			funcId;
+	Oid			sourceElemType;
+	Oid			targetElemType;
+	Node	   *elemexpr;
+	Oid			collation;
+
+	/* Transfotm array expression */
+	arrexpr = transformExprRecurse(pstate, a_mapexpr->arrexpr);
+
+	/* Determine element type and collation */
+	sourceElemType = get_element_type(exprType(arrexpr));
+	if (sourceElemType == InvalidOid)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("array expression is expected"),
+				 parser_errposition(pstate, exprLocation(a_mapexpr->arrexpr))));
+	collation = IsA(arrexpr, ArrayExpr) ?
+		get_typcollation(sourceElemType) :
+		exprCollation(arrexpr);
+
+	/* Replace placeholder ColumnRef by CaseTestExpr */
+	elemexpr = convert_placeholder(a_mapexpr->elemexpr,
+									a_mapexpr->placeholder,
+									sourceElemType,
+									exprTypmod(arrexpr),
+									collation);
+	/* Transform per-element expression */
+	elemexpr = transformExprRecurse(pstate, elemexpr);
+
+	/* Build resulting MapExpr */
+	map = makeNode(MapExpr);
+	map->arrexpr = (Expr *) arrexpr;
+	map->elemexpr = (Expr *) elemexpr;
+	map->resulttype = get_array_type(exprType(elemexpr));
+
+	return (Node *) map;
+}
+
+static Node *
 transformMultiAssignRef(ParseState *pstate, MultiAssignRef *maref)
 {
 	SubLink    *sublink;
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index f7b1f77..ddad247 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -92,6 +92,11 @@ typedef enum ExprEvalOp
 	EEOP_FUNCEXPR_STRICT_FUSAGE,
 
 	/*
+	 * Evaluate map expression
+	 */
+	EEOP_MAP,
+
+	/*
 	 * Evaluate boolean AND expression, one step per subexpression. FIRST/LAST
 	 * subexpressions are special-cased for performance.  Since AND always has
 	 * at least two subexpressions, FIRST and LAST never apply to the same
@@ -318,6 +323,13 @@ typedef struct ExprEvalStep
 			int			nargs;	/* number of arguments */
 		}			func;
 
+		struct
+		{
+			ExprState  *elemexprstate;	/* null if no per-element work */
+			Oid			resultelemtype; /* element type of result array */
+			struct ArrayMapState *amstate;	/* workspace for array_map */
+		}			map;
+
 		/* for EEOP_BOOL_*_STEP */
 		struct
 		{
@@ -695,6 +707,8 @@ extern void ExecEvalFuncExprFusage(ExprState *state, ExprEvalStep *op,
 					   ExprContext *econtext);
 extern void ExecEvalFuncExprStrictFusage(ExprState *state, ExprEvalStep *op,
 							 ExprContext *econtext);
+extern void ExecEvalMapExpr(ExprState *state, ExprEvalStep *op,
+				  ExprContext *econtext);
 extern void ExecEvalParamExec(ExprState *state, ExprEvalStep *op,
 				  ExprContext *econtext);
 extern void ExecEvalParamExecParams(Bitmapset *params, EState *estate);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adb159a..d34abc2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -428,6 +428,8 @@ typedef enum NodeTag
 	T_ParamRef,
 	T_A_Const,
 	T_FuncCall,
+	T_A_MapExpr,
+	T_MapExpr,
 	T_A_Star,
 	T_A_Indices,
 	T_A_Indirection,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 6390f7e..be4b2d1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -418,6 +418,18 @@ typedef struct A_ArrayExpr
 } A_ArrayExpr;
 
 /*
+ * A_MapExpr - array map expression
+ */
+typedef struct A_MapExpr
+{
+	NodeTag		type;
+	Node	   *elemexpr;
+	char	   *placeholder;
+	Node	   *arrexpr;
+	int			location;		/* token location, or -1 if unknown */
+} A_MapExpr;
+
+/*
  * ResTarget -
  *	  result target (used in target list of pre-transformed parse trees)
  *
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f90aa7b..f108898 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -460,6 +460,18 @@ typedef struct FuncExpr
 } FuncExpr;
 
 /*
+ * MapExpr - array map expression
+ */
+typedef struct MapExpr
+{
+	NodeTag		type;
+	Expr	   *elemexpr;		/* expression representing per-element work */
+	Expr	   *arrexpr;		/* source expression of array type */
+	Oid			resulttype;		/* OID of target array type */
+	Oid			resultcollid;	/* OID of collation, or InvalidOid if none */
+} MapExpr;
+
+/*
  * NamedArgExpr - a named argument of a function
  *
  * This node type can only appear in the args list of a FuncCall or FuncExpr
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 23db401..d6c5254 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -243,6 +243,7 @@ PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
+PG_KEYWORD("map", MAP, RESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
 PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)

Ildar Musin

i.musin@postgrespro.ru

over 7 years ago

In reply to: Ildar Musin (#4)

Re: MAP syntax for arrays

On 08.05.2018 15:49, Ildar Musin wrote:

select map (pow(x, 2) - 1 for x in array[1,2,3]);

Sorry, the example should be:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);
?column?
---------------
{1,3,7,15,31}
(1 row)

--
Ildar Musin
i.musin@postgrespro.ru

Chapman Flack

chap@anastigmatix.net

over 7 years ago

In reply to: Ildar Musin (#5)

Re: MAP syntax for arrays

On 05/08/2018 08:57 AM, Ildar Musin wrote:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);

I wonder how efficient an implementation would be possible strictly
as a function, without grammar changes?

While PostgreSQL certainly has extensions to and divergences from
standard SQL syntax, some historical and some recent, it seems like
there ought to be a highish bar for adding new ones; or, looking at it
another way, has this feature been proposed to the SQL committee?
It doesn't sound like a bad idea, and supporting new syntax for it
would be an easier call it if it were known to be in an SQL draft
somewhere.

-Chap

Chapman Flack

chap@anastigmatix.net

over 7 years ago

In reply to: Chapman Flack (#6)

Re: MAP syntax for arrays

On 05/08/2018 09:19 AM, Chapman Flack wrote:

While PostgreSQL certainly has extensions to and divergences from
standard SQL syntax, some historical and some recent, it seems like
there ought to be a highish bar for adding new ones; or, looking at it
another way, has this feature been proposed to the SQL committee?
It doesn't sound like a bad idea, and supporting new syntax for it
would be an easier call it if it were known to be in an SQL draft
somewhere.

There seems to have been some work at Databricks adding higher-order
function syntax to their SQL; they've chosen the name 'transform'
for 'map', and also provided 'exists', 'filter', and 'aggregate'.

https://databricks.com/blog/2017/05/24/working-with-nested-data-using-higher-order-functions-in-sql-on-databricks.html

-Chap

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 7 years ago

In reply to: Chapman Flack (#6)

Re: MAP syntax for arrays

On 5/8/18 09:19, Chapman Flack wrote:

On 05/08/2018 08:57 AM, Ildar Musin wrote:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);

I wonder how efficient an implementation would be possible strictly
as a function, without grammar changes?

Yeah, you can pass a function to another function (using regprocedure or
just oid), so this should be possible entirely in user space.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Peter Eisentraut (#8)

Re: MAP syntax for arrays

Peter Eisentraut wrote:

On 5/8/18 09:19, Chapman Flack wrote:

On 05/08/2018 08:57 AM, Ildar Musin wrote:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);

I wonder how efficient an implementation would be possible strictly
as a function, without grammar changes?

Yeah, you can pass a function to another function (using regprocedure or
just oid), so this should be possible entirely in user space.

How would you invoke it? It seems you'd be forced to use EXECUTE in a
plpgsql function, or a C function.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10

Andreas Karlsson

andreas@proxel.se

over 7 years ago

In reply to: Ildar Musin (#4)

Re: MAP syntax for arrays

On 05/08/2018 02:49 PM, Ildar Musin wrote:

The main point of this patch was about convenience; the performance
thing came out later just as a side effect :) Many users are familiar
with "map/reduce/filter" concept that many languages (not only
functional ones) utilized. And my though was that it would be great to
have those in postgres too.

Map is a feature I have quite often wished to have, but I am not sure it
is quite useful enough to be worth adding our own proprietary syntax. It
would be a pain if the SQL committee started using MAP for something.

Andreas

#11

Andrew Gierth

andrew@tao11.riddles.org.uk

over 7 years ago

In reply to: Andreas Karlsson (#10)

Re: MAP syntax for arrays

"Andreas" == Andreas Karlsson <andreas@proxel.se> writes:

Andreas> It would be a pain if the SQL committee started using MAP for
Andreas> something.

They already did - MAP is a non-reserved keyword in sql2016, used at
least with <user-defined ordering definition>. Can't see any obvious
conflict with use in expressions, but I haven't checked all the
references.

--
Andrew (irc:RhodiumToad)

#12

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 7 years ago

In reply to: Alvaro Herrera (#9)

Re: MAP syntax for arrays

On 5/8/18 10:18, Alvaro Herrera wrote:

Peter Eisentraut wrote:

On 5/8/18 09:19, Chapman Flack wrote:

On 05/08/2018 08:57 AM, Ildar Musin wrote:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);

I wonder how efficient an implementation would be possible strictly
as a function, without grammar changes?

Yeah, you can pass a function to another function (using regprocedure or
just oid), so this should be possible entirely in user space.

How would you invoke it? It seems you'd be forced to use EXECUTE in a
plpgsql function, or a C function.

Yes, I was thinking about a C function.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Peter Eisentraut (#12)

Re: MAP syntax for arrays

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 5/8/18 10:18, Alvaro Herrera wrote:

How would you invoke it? It seems you'd be forced to use EXECUTE in a
plpgsql function, or a C function.

Yes, I was thinking about a C function.

The thing actually implementing MAP would presumably be in C,
so this doesn't seem like a problem technically. But having to
create a function seems like a big usability stumbling block,
probably a big enough one to make the "select array_agg(expression)
from unnest(something)" approach more attractive.

I do see the usability benefit of a dedicated MAP syntax --- I'm
just afraid of getting out in front of the SQL committee on such
things. I doubt that it's enough nicer than the sub-select way
to justify risking future standards-compliance issues.

Realistically, we're talking about this:

select a, b, (select array_agg(x*2) from unnest(arraycol) x)
from ...

versus something on the order of this:

select a, b, map(x*2 over x from arraycol)
from ...

Yeah, it's a bit shorter, but not that much ... and there's a lot
more you can do with the sub-select syntax, eg add a WHERE filter.

regards, tom lane

#14

Ildar Musin

i.musin@postgrespro.ru

over 7 years ago

In reply to: Peter Eisentraut (#8)

Re: MAP syntax for arrays

On 08.05.2018 17:15, Peter Eisentraut wrote:

On 5/8/18 09:19, Chapman Flack wrote:

On 05/08/2018 08:57 AM, Ildar Musin wrote:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);

I wonder how efficient an implementation would be possible
strictly as a function, without grammar changes?

Yeah, you can pass a function to another function (using regprocedure
or just oid), so this should be possible entirely in user space.

The problem with this approach is that extension should either have
single map() function with input and output type of anyarray which
cannot be used when user needs to map int[] to text[] for example. Or
the other way there should be a set of map functions for different
intput/output types.

Another thing is that this approach is not as versatile since user need
to create a function before running map, while with the proposed patch
they could run arbitrary expression over the array directly.

--
Ildar Musin
i.musin@postgrespro.ru

#15

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Andrew Gierth (#11)

Re: MAP syntax for arrays

Andrew Gierth wrote:

"Andreas" == Andreas Karlsson <andreas@proxel.se> writes:

Andreas> It would be a pain if the SQL committee started using MAP for
Andreas> something.

They already did - MAP is a non-reserved keyword in sql2016, used at
least with <user-defined ordering definition>. Can't see any obvious
conflict with use in expressions, but I haven't checked all the
references.

Ah, so in SQL2011 (and 2016) there already is a designator for routines,
which was a thing we were lacking previously, as I recall.

<specific routine designator> ::=
SPECIFIC <routine type> <specific name>
| <routine type> <member name> [ FOR <schema-resolved user-defined type name> ]

<member name> ::=
<member name alternatives> [ <data type list> ]

[elsewhere]

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16

Heikki Linnakangas

hlinnaka@iki.fi

over 7 years ago

In reply to: Ildar Musin (#14)

Re: MAP syntax for arrays

On 08/05/18 18:11, Ildar Musin wrote:

On 08.05.2018 17:15, Peter Eisentraut wrote:

On 5/8/18 09:19, Chapman Flack wrote:

On 05/08/2018 08:57 AM, Ildar Musin wrote:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);

I wonder how efficient an implementation would be possible
strictly as a function, without grammar changes?

Yeah, you can pass a function to another function (using regprocedure
or just oid), so this should be possible entirely in user space.

The problem with this approach is that extension should either have
single map() function with input and output type of anyarray which
cannot be used when user needs to map int[] to text[] for example. Or
the other way there should be a set of map functions for different
intput/output types.

Another thing is that this approach is not as versatile since user need
to create a function before running map, while with the proposed patch
they could run arbitrary expression over the array directly.

The consensus on this seems to be that we don't want this. Yeah, it's a
handy syntax, but it's not that much better than using a subselect or a
writing a user-defined function. And there's risk of future conflicts
with the SQL standard. I'll mark this as "Rejected" in the commitfest.

- Heikki