Functional dependencies and GROUP BY

Started by Peter Eisentrautover 15 years ago38 messages

peter_e@gmx.net

over 15 years ago

1 attachment(s)

I have developed a patch that partially implements the "functional
dependency" feature that allows some columns to be omitted from the
GROUP BY clause if it can be shown that the columns are functionally
dependent on the columns in the group by clause and therefore guaranteed
to be unique per group. The full functional dependency deduction rules
are pretty big and arcane, so I concentrated on getting a useful subset
working. In particular:

When grouping by primary key, the other columns can be omitted, e.g.,

CREATE TABLE tab1 (a int PRIMARY KEY, b int);

SELECT a, b FROM tab1 GROUP BY a;

This is frequently requested by MySQL converts (and possibly others).

Also, when a column is compared with a constant, it can appear
ungrouped:

SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

For lack of a better idea, I have made it so that merge-joinable
operators qualify as equality operators. Better ideas welcome.

Other rules could be added over time (but I'm current not planning to
work on that myself).

At this point, this patch could use some review and testing with unusual
queries that break my implementation. ;-)

Attachments:

functional-deps.patchtext/x-patch; charset=UTF-8; name=functional-deps.patchDownload

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index d0c41ce..e40cc4c 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
     In this example, the columns <literal>product_id</literal>,
     <literal>p.name</literal>, and <literal>p.price</literal> must be
     in the <literal>GROUP BY</> clause since they are referenced in
-    the query select list.  (Depending on how the products
-    table is set up, name and price might be fully dependent on the
-    product ID, so the additional groupings could theoretically be
-    unnecessary, though this is not implemented.)  The column
+    the query select list (but see below).  The column
     <literal>s.units</> does not have to be in the <literal>GROUP
     BY</> list since it is only used in an aggregate expression
     (<literal>sum(...)</literal>), which represents the sales
@@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
    </para>
 
    <para>
+    If the products table is set up so that,
+    say, <literal>product_id</literal> is the primary key or a
+    not-null unique constraint, then it would be enough to group
+    by <literal>product_id</literal> in the above example, since name
+    and price would be <firstterm>functionally
+    dependent</firstterm><indexterm><primary>functional
+    dependency</primary></indexterm> on the product ID, and so there
+    would be no ambiguity about which name and price value to return
+    for each product ID group.
+   </para>
+
+   <para>
     In strict SQL, <literal>GROUP BY</> can only group by columns of
     the source table but <productname>PostgreSQL</productname> extends
     this to also allow <literal>GROUP BY</> to group by columns in the
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index a4d017f..d901390 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -520,9 +520,17 @@ GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
     produces a single value computed across all the selected rows).
     When <literal>GROUP BY</literal> is present, it is not valid for
     the <command>SELECT</command> list expressions to refer to
-    ungrouped columns except within aggregate functions, since there
-    would be more than one possible value to return for an ungrouped
-    column.
+    ungrouped columns except within aggregate functions or if the
+    ungrouped column is functionally dependent on the grouped columns,
+    since there would otherwise be more than one possible value to
+    return for an ungrouped column.  A functional dependency exists if
+    the grouped columns (or a subset thereof) are the primary key or a
+    not-null unique constraint of the table containing the ungrouped
+    column.  A functional dependency also exists if the ungrouped
+    column is constrained by the <literal>WHERE</literal> clause to a
+    constant value (for example, by equality comparison with a
+    constant).  Further rules for determining functional dependencies
+    might be added in the future.
    </para>
   </refsect2>
 
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 0a69bde..bfdd7ef 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -14,6 +14,8 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
+#include "catalog/pg_index.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/tlist.h"
@@ -24,20 +26,23 @@
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 
 
 typedef struct
 {
 	ParseState *pstate;
+	Query	   *qry;
 	List	   *groupClauses;
 	bool		have_non_var_grouping;
 	int			sublevels_up;
 } check_ungrouped_columns_context;
 
-static void check_ungrouped_columns(Node *node, ParseState *pstate,
+static void check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
 						List *groupClauses, bool have_non_var_grouping);
 static bool check_ungrouped_columns_walker(Node *node,
 							   check_ungrouped_columns_context *context);
+static bool is_functionally_dependent(List *group_clauses, Var *var, Query *qry);
 
 
 /*
@@ -408,13 +413,13 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
 	clause = (Node *) qry->targetList;
 	if (hasJoinRTEs)
 		clause = flatten_join_alias_vars(root, clause);
-	check_ungrouped_columns(clause, pstate,
+	check_ungrouped_columns(clause, pstate, qry,
 							groupClauses, have_non_var_grouping);
 
 	clause = (Node *) qry->havingQual;
 	if (hasJoinRTEs)
 		clause = flatten_join_alias_vars(root, clause);
-	check_ungrouped_columns(clause, pstate,
+	check_ungrouped_columns(clause, pstate, qry,
 							groupClauses, have_non_var_grouping);
 
 	/*
@@ -535,12 +540,13 @@ parseCheckWindowFuncs(ParseState *pstate, Query *qry)
  * way more pain than the feature seems worth.
  */
 static void
-check_ungrouped_columns(Node *node, ParseState *pstate,
+check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
 						List *groupClauses, bool have_non_var_grouping)
 {
 	check_ungrouped_columns_context context;
 
 	context.pstate = pstate;
+	context.qry = qry;
 	context.groupClauses = groupClauses;
 	context.have_non_var_grouping = have_non_var_grouping;
 	context.sublevels_up = 0;
@@ -607,16 +613,8 @@ check_ungrouped_columns_walker(Node *node,
 		 */
 		if (!context->have_non_var_grouping || context->sublevels_up != 0)
 		{
-			foreach(gl, context->groupClauses)
-			{
-				Var		   *gvar = (Var *) lfirst(gl);
-
-				if (IsA(gvar, Var) &&
-					gvar->varno == var->varno &&
-					gvar->varattno == var->varattno &&
-					gvar->varlevelsup == 0)
-					return false;		/* acceptable, we're okay */
-			}
+			if (is_functionally_dependent(context->groupClauses, var, context->qry))
+				return false;	/* acceptable, we're okay */
 		}
 
 		/* Found an ungrouped local variable; generate error message */
@@ -656,6 +654,248 @@ check_ungrouped_columns_walker(Node *node,
 }
 
 /*
+ * Check whether the attributes of the primary key or a not-null
+ * unique constraint of relid with range table index rteno appear as a
+ * subset of the group_clauses.  (If so, a functional dependency
+ * exists between the group clauses and any attribute of the relation,
+ * and so attributes of the relation can appear ungrouped.)
+ */
+static bool
+funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno)
+{
+	Relation	rel;
+	ListCell   *indexoidcell;
+
+	rel = heap_open(relid, AccessShareLock);
+
+	foreach(indexoidcell, RelationGetIndexList(rel))
+	{
+		Oid         indexoid = lfirst_oid(indexoidcell);
+		HeapTuple   indexTuple;
+		Form_pg_index indexStruct;
+		int         i;
+		bool		found_col;
+		bool		found_all_cols;
+
+		indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexoid));
+		if (!HeapTupleIsValid(indexTuple))
+			elog(ERROR, "cache lookup failed for index %u", indexoid);
+		indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
+
+		if ((!indexStruct->indisunique && !indexStruct->indisprimary)
+			|| !indexStruct->indimmediate)
+			continue;
+
+		/*
+		 * Check that the group columns are a superset of the
+		 * primary key columns.
+		 */
+		for (i = 0; i < indexStruct->indnatts; i++)
+		{
+			HeapTuple       tp;
+			int2			attnum;
+			ListCell	   *gl;
+
+			attnum = indexStruct->indkey.values[i];
+			found_col = false;
+
+			tp = SearchSysCache2(ATTNUM,
+								 ObjectIdGetDatum(relid),
+								 Int16GetDatum(attnum));
+			if (HeapTupleIsValid(tp))
+			{
+				Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+				bool        attnotnull;
+
+				attnotnull = att_tup->attnotnull;
+				ReleaseSysCache(tp);
+				if (!attnotnull)
+					break;
+			}
+			else
+				break;
+
+			foreach(gl, group_clauses)
+			{
+				Var		   *gvar = (Var *) lfirst(gl);
+
+				if (IsA(gvar, Var) &&
+					gvar->varno == rteno &&
+					gvar->varattno == attnum &&
+					gvar->varlevelsup == 0)
+				{
+					found_col = true;
+					break;
+				}
+			}
+			if (!found_col)
+				break;
+		}
+		found_all_cols = (i == indexStruct->indnatts && found_col);
+
+		ReleaseSysCache(indexTuple);
+		if (found_all_cols)
+		{
+			heap_close(rel, NoLock);
+			return true;
+		}
+	}
+
+	heap_close(rel, NoLock);
+
+	return false;
+}
+
+/*
+ * Check if the node is an operator expression of the form var =
+ * constant (or vice versa).  (Note that this allows the var to be
+ * ungrouped because it will be guaranteed to be the same across all
+ * groups.)
+ */
+static bool
+is_var_equals_constant(Node *node, Var *var)
+{
+	OpExpr *oe;
+
+	if (!IsA(node, OpExpr))
+		return false;
+
+	oe = (OpExpr *) node;
+
+	if (list_length(oe->args) == 2
+		&& op_mergejoinable(oe->opno))
+	{
+		Node	   *n1 = linitial(oe->args);
+		Node	   *n2 = lsecond(oe->args);
+		Var		   *v;
+		Const	   *c;
+
+		if (IsA(n1, Var) && IsA(n2, Const))
+		{
+			v = (Var *) n1;
+			c = (Const *) n2;
+		}
+		else if (IsA(n2, Var) && IsA(n1, Const))
+		{
+			v = (Var *) n2;
+			c = (Const *) n1;
+		}
+		else
+			return false;
+
+		if (v->varno == var->varno
+			&& v->varattno == var->varattno
+			&& v->varlevelsup == 0
+			&& !c->constisnull)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Search the top-level AND components of the join tree for a match
+ * with checkfunc.
+ */
+static bool
+funcdeps_search_and_components(Node *node, bool(checkfunc)(Node *node, Var *var),  Var *var)
+{
+	if (!node)
+		return false;
+	else if (IsA(node, FromExpr))
+	{
+		FromExpr   *fe = (FromExpr *) node;
+
+		return (funcdeps_search_and_components(fe->quals, checkfunc, var)
+				|| funcdeps_search_and_components((Node *) fe->fromlist, checkfunc, var));
+	}
+	else if (IsA(node, JoinExpr))
+	{
+		JoinExpr *je = (JoinExpr *) node;
+
+		return ((je->jointype == JOIN_INNER && funcdeps_search_and_components(je->quals, checkfunc, var))
+				|| funcdeps_search_and_components(je->larg, checkfunc, var)
+				|| funcdeps_search_and_components(je->rarg, checkfunc, var));
+	}
+	else if (IsA(node, BoolExpr))
+	{
+		BoolExpr *be = (BoolExpr *) node;
+
+		if (be->boolop == AND_EXPR)
+			return funcdeps_search_and_components((Node *) be->args, checkfunc, var);
+		else
+			return false;
+	}
+	else if (IsA(node, List))
+	{
+		List	   *list = (List *) node;
+		ListCell   *cell;
+
+		foreach(cell, list)
+		{
+			Node *n = lfirst(cell);
+			bool		check;
+
+			check = funcdeps_search_and_components(n, checkfunc, var);
+			if (check)
+				return true;
+		}
+		return false;
+	}
+	else
+		return checkfunc(node, var);
+}
+
+/*
+ * Check whether var is functionally dependent on group_clauses in the
+ * context of qry.
+ *
+ * Known functional dependencies are defined in the SQL standard.
+ * This function currently only implements a subset.
+ */
+static bool
+is_functionally_dependent(List *group_clauses, Var *var, Query *qry)
+{
+	bool		check;
+	ListCell   *gl;
+	RangeTblEntry *rte;
+
+	/*
+	 * Easy case: var appears directly in group clauses.
+	 */
+	foreach(gl, group_clauses)
+	{
+		Var		   *gvar = (Var *) lfirst(gl);
+
+		if (IsA(gvar, Var) &&
+			gvar->varno == var->varno &&
+			gvar->varattno == var->varattno &&
+			gvar->varlevelsup == 0)
+			return true;
+	}
+
+	/*
+	 * Primary key of var's table is subset of group clauses.
+	 */
+	rte = rt_fetch(var->varno, qry->rtable);
+	if (rte->rtekind == RTE_RELATION)
+	{
+		check = funcdeps_check_pk(group_clauses, rte->relid, var->varno);
+		if (check)
+			return true;
+	}
+
+	/*
+	 * Var is compared with constant in top-level AND clause.
+	 */
+	check = funcdeps_search_and_components((Node *) qry->jointree, is_var_equals_constant, var);
+	if (check)
+		return true;
+
+	return false;
+}
+
+/*
  * Create expression trees for the transition and final functions
  * of an aggregate.  These are needed so that polymorphic functions
  * can be used within an aggregate --- without the expression trees,
diff --git a/src/test/regress/expected/functional_deps.out b/src/test/regress/expected/functional_deps.out
new file mode 100644
index 0000000..5391782
--- /dev/null
+++ b/src/test/regress/expected/functional_deps.out
@@ -0,0 +1,248 @@
+-- from http://www.depesz.com/index.php/2010/04/19/getting-unique-elements/
+CREATE TABLE articles (
+    id int PRIMARY KEY,
+    keywords text,
+    title text UNIQUE NOT NULL,
+    body text UNIQUE,
+    created date
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "articles_pkey" for table "articles"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "articles_title_key" for table "articles"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "articles_body_key" for table "articles"
+CREATE TABLE articles_in_category (
+    article_id int,
+    category_id int,
+    changed date,
+    PRIMARY KEY (article_id, category_id)
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "articles_in_category_pkey" for table "articles_in_category"
+-- test functional dependencies based on primary keys/unique constraints
+-- base tables
+-- group by primary key (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by unique not null (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY title;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by unique nullable (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT id, keywords, title, body, created
+               ^
+-- group by something else (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY keywords;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT id, keywords, title, body, created
+               ^
+-- multiple tables
+-- group by primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+ERROR:  column "a.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.id, a.keywords, a.title, a.body, a.created
+               ^
+-- JOIN syntax
+-- group by left table's primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+ERROR:  column "a.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.id, a.keywords, a.title, a.body, a.created
+               ^
+-- group by right table's (composite) primary key (OK)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.category_id, aic.article_id;
+ changed 
+---------
+(0 rows)
+
+-- group by right table's partial primary key (fail)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id;
+ERROR:  column "aic.changed" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT aic.changed
+               ^
+-- example from documentation
+CREATE TABLE products (product_id int, name text, price numeric);
+CREATE TABLE sales (product_id int, units int);
+-- OK
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id, p.name, p.price;
+ product_id | name | sales 
+------------+------+-------
+(0 rows)
+
+-- fail
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+ERROR:  column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+                           ^
+ALTER TABLE products ADD PRIMARY KEY (product_id);
+NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "products_pkey" for table "products"
+-- OK now
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+ product_id | name | sales 
+------------+------+-------
+(0 rows)
+
+-- Drupal example, http://drupal.org/node/555530
+CREATE TABLE node (
+    nid SERIAL,
+    vid integer NOT NULL default '0',
+    type varchar(32) NOT NULL default '',
+    title varchar(128) NOT NULL default '',
+    uid integer NOT NULL default '0',
+    status integer NOT NULL default '1',
+    created integer NOT NULL default '0',
+    -- snip
+    PRIMARY KEY (nid, vid)
+);
+NOTICE:  CREATE TABLE will create implicit sequence "node_nid_seq" for serial column "node.nid"
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "node_pkey" for table "node"
+CREATE TABLE users (
+    uid integer NOT NULL default '0',
+    name varchar(60) NOT NULL default '',
+    pass varchar(32) NOT NULL default '',
+    -- snip
+    PRIMARY KEY (uid),
+    UNIQUE (name)
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "users_pkey" for table "users"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "users_name_key" for table "users"
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid, u.name;
+ uid | name 
+-----+------
+(0 rows)
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid;
+ uid | name 
+-----+------
+(0 rows)
+
+-- test functional dependencies based on WHERE clause
+-- var = const (OK)
+SELECT body, keywords
+FROM articles
+WHERE keywords = 'foo bar'
+GROUP BY body;
+ body | keywords 
+------+----------
+(0 rows)
+
+-- const = var (OK)
+SELECT body, keywords
+FROM articles
+WHERE 'foo bar' = keywords
+GROUP BY body;
+ body | keywords 
+------+----------
+(0 rows)
+
+-- nested in AND (OK)
+SELECT body, keywords
+FROM articles
+WHERE keywords = 'foo bar' AND body <> ''
+GROUP BY body;
+ body | keywords 
+------+----------
+(0 rows)
+
+-- nested in OR (fail)
+SELECT body, keywords
+FROM articles
+WHERE keywords = 'foo bar' OR body <> ''
+GROUP BY body;
+ERROR:  column "articles.keywords" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT body, keywords
+                     ^
+-- not equality (fail)
+SELECT body, keywords
+FROM articles
+WHERE keywords > 'foo bar'
+GROUP BY body;
+ERROR:  column "articles.keywords" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT body, keywords
+                     ^
+-- base case with joins (fail)
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+GROUP BY a.body, aic.category_id;
+ERROR:  column "a.keywords" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.body, a.keywords, aic.category_id
+                       ^
+-- OK
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE a.keywords = 'foo bar'
+GROUP BY a.body, aic.category_id;
+ body | keywords | category_id 
+------+----------+-------------
+(0 rows)
+
+-- OK
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id AND a.keywords = 'foo bar'
+GROUP BY a.body, aic.category_id;
+ body | keywords | category_id 
+------+----------+-------------
+(0 rows)
+
+-- fail
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a LEFT JOIN articles_in_category AS aic ON a.id = aic.article_id AND a.keywords = 'foo bar'
+GROUP BY a.body, aic.category_id;
+ERROR:  column "a.keywords" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.body, a.keywords, aic.category_id
+                       ^
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 7529777..191d1fe 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -84,7 +84,7 @@ test: rules
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap
+test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5f185f9..e38d5f0 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -76,6 +76,7 @@ test: union
 test: case
 test: join
 test: aggregates
+test: functional_deps
 test: transactions
 ignore: random
 test: random
diff --git a/src/test/regress/sql/functional_deps.sql b/src/test/regress/sql/functional_deps.sql
new file mode 100644
index 0000000..9272878
--- /dev/null
+++ b/src/test/regress/sql/functional_deps.sql
@@ -0,0 +1,193 @@
+-- from http://www.depesz.com/index.php/2010/04/19/getting-unique-elements/
+
+CREATE TABLE articles (
+    id int PRIMARY KEY,
+    keywords text,
+    title text UNIQUE NOT NULL,
+    body text UNIQUE,
+    created date
+);
+
+CREATE TABLE articles_in_category (
+    article_id int,
+    category_id int,
+    changed date,
+    PRIMARY KEY (article_id, category_id)
+);
+
+-- test functional dependencies based on primary keys/unique constraints
+
+-- base tables
+
+-- group by primary key (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+
+-- group by unique not null (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY title;
+
+-- group by unique nullable (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+
+-- group by something else (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY keywords;
+
+-- multiple tables
+
+-- group by primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+
+-- JOIN syntax
+
+-- group by left table's primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+
+-- group by right table's (composite) primary key (OK)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.category_id, aic.article_id;
+
+-- group by right table's partial primary key (fail)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id;
+
+
+-- example from documentation
+
+CREATE TABLE products (product_id int, name text, price numeric);
+CREATE TABLE sales (product_id int, units int);
+
+-- OK
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id, p.name, p.price;
+
+-- fail
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+
+ALTER TABLE products ADD PRIMARY KEY (product_id);
+
+-- OK now
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+
+
+-- Drupal example, http://drupal.org/node/555530
+
+CREATE TABLE node (
+    nid SERIAL,
+    vid integer NOT NULL default '0',
+    type varchar(32) NOT NULL default '',
+    title varchar(128) NOT NULL default '',
+    uid integer NOT NULL default '0',
+    status integer NOT NULL default '1',
+    created integer NOT NULL default '0',
+    -- snip
+    PRIMARY KEY (nid, vid)
+);
+
+CREATE TABLE users (
+    uid integer NOT NULL default '0',
+    name varchar(60) NOT NULL default '',
+    pass varchar(32) NOT NULL default '',
+    -- snip
+    PRIMARY KEY (uid),
+    UNIQUE (name)
+);
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid, u.name;
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid;
+
+
+-- test functional dependencies based on WHERE clause
+
+-- var = const (OK)
+SELECT body, keywords
+FROM articles
+WHERE keywords = 'foo bar'
+GROUP BY body;
+
+-- const = var (OK)
+SELECT body, keywords
+FROM articles
+WHERE 'foo bar' = keywords
+GROUP BY body;
+
+-- nested in AND (OK)
+SELECT body, keywords
+FROM articles
+WHERE keywords = 'foo bar' AND body <> ''
+GROUP BY body;
+
+-- nested in OR (fail)
+SELECT body, keywords
+FROM articles
+WHERE keywords = 'foo bar' OR body <> ''
+GROUP BY body;
+
+-- not equality (fail)
+SELECT body, keywords
+FROM articles
+WHERE keywords > 'foo bar'
+GROUP BY body;
+
+-- base case with joins (fail)
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+GROUP BY a.body, aic.category_id;
+
+-- OK
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE a.keywords = 'foo bar'
+GROUP BY a.body, aic.category_id;
+
+-- OK
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id AND a.keywords = 'foo bar'
+GROUP BY a.body, aic.category_id;
+
+-- fail
+SELECT a.body, a.keywords, aic.category_id
+FROM articles AS a LEFT JOIN articles_in_category AS aic ON a.id = aic.article_id AND a.keywords = 'foo bar'
+GROUP BY a.body, aic.category_id;

Greg Stark

gsstark@mit.edu

over 15 years ago

In reply to: Peter Eisentraut (#1)

Re: Functional dependencies and GROUP BY

On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

I have developed a patch that partially implements the "functional
dependency" feature

Nice! :)

--
greg

Hitoshi Harada

umi.tanuki@gmail.com

over 15 years ago

In reply to: Peter Eisentraut (#1)

Re: Functional dependencies and GROUP BY

2010/6/8 Peter Eisentraut <peter_e@gmx.net>:

I have developed a patch that partially implements the "functional
dependency" feature that allows some columns to be omitted from the
GROUP BY clause if it can be shown that the columns are functionally
dependent on the columns in the group by clause and therefore guaranteed
to be unique per group. The full functional dependency deduction rules
are pretty big and arcane, so I concentrated on getting a useful subset
working. In particular:

Also, when a column is compared with a constant, it can appear
ungrouped:

SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

I don't see why it should be allowed. I see the insist that y must be
unique value so it is ok to be ungrouped but the point of discussion
is far from that; Semantically y is not grouping key.

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

or there should be more complicated example including JOIN cases. I
don't believe we can detect all of such cases. If the simple case is
allowed, users don't understand why the complicated case doesn't allow
sometimes. So it'll not be consistent.

Finally, it may hide unintended bugs. ORM tools may make WHERE clause
in some conditions and don't in other conditions.

Regards,

--
Hitoshi Harada

Stephen Frost

sfrost@snowman.net

over 15 years ago

In reply to: Hitoshi Harada (#3)

Re: Functional dependencies and GROUP BY

* Hitoshi Harada (umi.tanuki@gmail.com) wrote:

I don't see why it should be allowed. I see the insist that y must be
unique value so it is ok to be ungrouped but the point of discussion
is far from that; Semantically y is not grouping key.

Ignoring the fact that it's terribly useful- isn't it part of the SQL
spec?

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

Not sure I see the issue here?

Finally, it may hide unintended bugs. ORM tools may make WHERE clause
in some conditions and don't in other conditions.

Yeah, this one I really just done buy.. If an ORM tool doesn't write
correct SQL, then it's the ORM's fault, not ours.

Thanks,

Stephen

Stephen Frost

sfrost@snowman.net

over 15 years ago

In reply to: Peter Eisentraut (#1)

Re: Functional dependencies and GROUP BY

* Peter Eisentraut (peter_e@gmx.net) wrote:

This is frequently requested by MySQL converts (and possibly others).

I'd certainly love to see it- but let's not confuse people by implying
that it would actually act the way MySQL does. It wouldn't, because
what MySQL does is alot closer to 'distinct on' and is patently insane
to boot. Again, I'd *love* to see this be done in PG, but when we
document it and tell people about it, *please* don't say it's similar in
any way to MySQL's "oh, we'll just pick a random value from the columns
that aren't group'd on" implementation.

At this point, this patch could use some review and testing with unusual
queries that break my implementation. ;-)

I'll give it a shot... :)

Thanks!

Stephen

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Peter Eisentraut (#1)

Re: Functional dependencies and GROUP BY

Peter Eisentraut <peter_e@gmx.net> writes:

I have developed a patch that partially implements the "functional
dependency" feature that allows some columns to be omitted from the
GROUP BY clause if it can be shown that the columns are functionally
dependent on the columns in the group by clause and therefore guaranteed
to be unique per group.

The main objection to this is the same one I've had all along: it makes
the syntactic validity of a query dependent on what indexes exist for
the table. At minimum, that means that enforcing the check at parse
time is the Wrong Thing.

The var-compared-with-constant case seems like a big crock. Are we
really required to provide such a thing per spec?

I'm also fairly concerned about the performance of a check implemented
this way --- it's going to do a lot of work, and do it over and over
again as it traverses the query tree. At least some of that could be
alleviated after you move the check to the planner, just by virtue of
the index information already having been acquired ... but I'd still
suggest expending more than no effort on caching the results. For
instance, given "SELECT * FROM a_very_wide_table GROUP BY pk" you
shouldn't have to prove more than once that a_very_wide_table is
grouped by its PK.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

over 15 years ago

In reply to: Greg Stark (#2)

Re: Functional dependencies and GROUP BY

2010/6/7 Greg Stark <gsstark@mit.edu>:

On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

I have developed a patch that partially implements the "functional
dependency" feature

Nice! :)

I like this idea too. It can simplify some queries and I believe - it
is very good marketing bonus for us.

Pavel

Show quoted text

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Hitoshi Harada (#3)

Re: Functional dependencies and GROUP BY

On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:

Also, when a column is compared with a constant, it can appear
ungrouped:

SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

I don't see why it should be allowed. I see the insist that y must be
unique value so it is ok to be ungrouped but the point of discussion
is far from that; Semantically y is not grouping key.

I'm not sure what your argument is. If y is uniquely determined within
each group, then it's OK for it to be ungrouped. What other criteria do
you have in mind for determining that instead? It looks like you are
going by aesthetics. ;-)

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

or there should be more complicated example including JOIN cases. I
don't believe we can detect all of such cases. If the simple case is
allowed, users don't understand why the complicated case doesn't allow
sometimes. So it'll not be consistent.

Yes, as I said, my implementation is incomplete in the sense that it
only recognizes some functional dependencies. To recognize the sort of
thing you show, you would need some kind of complex deduction or proof
engine, and that doesn't seem worthwhile, at least for me, at this
point.

Rob Wultsch

wultsch@gmail.com

over 15 years ago

In reply to: Stephen Frost (#5)

Re: Functional dependencies and GROUP BY

On Mon, Jun 7, 2010 at 6:41 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Peter Eisentraut (peter_e@gmx.net) wrote:

This is frequently requested by MySQL converts (and possibly others).

I'd certainly love to see it- but let's not confuse people by implying
that it would actually act the way MySQL does. It wouldn't, because
what MySQL does is alot closer to 'distinct on' and is patently insane
to boot. Again, I'd *love* to see this be done in PG, but when we
document it and tell people about it, *please* don't say it's similar in
any way to MySQL's "oh, we'll just pick a random value from the columns
that aren't group'd on" implementation.

Preface: I work as a MySQL DBA (yeah, yeah, laugh it up...).

It has been my experience that the vast majority of the time when a
MySQL users make use of the "fine feature" which allows them to use
unaggregated columns which is not present in the GROUP BY clause in an
aggregating query they have made an error because they do not
understand GROUP BY. I have found this lack of understanding to be
very wide spread across the MySQL developer and *DBA* communities. I
also would really hesitate to compare this useful feature to the *fine
feature* present in MySQL. Due to a long standing bug
(http://bugs.mysql.com/bug.php?id=8510) it really is not possible to
get MySQL to behave sanely. It is my impression that many programs of
significant size that interact with MySQL have errors because of this
issue and it would be good to not claim to have made Postgres
compatible.

That said, I imagine if this feature could make it into the Postgres
tree it would be very useful.

Would I be correct in assuming that while this feature would make
query planning more expensive, it would also often decrease the cost
of execution?

Best,

Rob Wultsch
wultsch@gmail.com

#10

Greg Stark

gsstark@mit.edu

over 15 years ago

In reply to: Tom Lane (#6)

Re: Functional dependencies and GROUP BY

On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

I have developed a patch that partially implements the "functional
dependency" feature that allows some columns to be omitted from the
GROUP BY clause if it can be shown that the columns are functionally
dependent on the columns in the group by clause and therefore guaranteed
to be unique per group.

The main objection to this is the same one I've had all along: it makes
the syntactic validity of a query dependent on what indexes exist for
the table. At minimum, that means that enforcing the check at parse
time is the Wrong Thing.

It also needs to ensure that the plan is invalidated if the constraint
is dropped, which I assume amounts to the same thing.

--
greg

#11

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Greg Stark (#10)

Re: Functional dependencies and GROUP BY

Greg Stark <gsstark@mit.edu> writes:

On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The main objection to this is the same one I've had all along: it makes
the syntactic validity of a query dependent on what indexes exist for
the table. �At minimum, that means that enforcing the check at parse
time is the Wrong Thing.

It also needs to ensure that the plan is invalidated if the constraint
is dropped, which I assume amounts to the same thing.

Well, no, any cached plan will get invalidated if the index goes away.
The big problem with this implementation is that you could create a
*rule* (eg a view) containing a query whose validity depends on the
existence of an index. Dropping the index will not cause the rule
to be invalidated.

Perhaps the correct fix would be to mark stored query trees as having a
dependency on the index, so that dropping the index/constraint would
force a drop of the rule too. Just pushing the check to plan time, as
I suggested yesterday, isn't a very nice fix because it would result
in the rule unexpectedly starting to fail at execution.

regards, tom lane

#12

Greg Stark

gsstark@mit.edu

over 15 years ago

In reply to: Tom Lane (#11)

Re: Functional dependencies and GROUP BY

On Tue, Jun 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, no, any cached plan will get invalidated if the index goes away.
The big problem with this implementation is that you could create a
*rule* (eg a view) containing a query whose validity depends on the
existence of an index. Dropping the index will not cause the rule
to be invalidated.

Hm, I was incorrectly thinking of this as analogous to the cases of
plans that could be optimized based on the existence of a constraint.
For example removing columns from a sort key because they're unique.
But this is different because not just the plan but the validity of
the query itself is dependent on the constraint.

--
greg

#13

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Peter Eisentraut (#8)

Re: Functional dependencies and GROUP BY

Peter Eisentraut <peter_e@gmx.net> writes:

On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

Yes, as I said, my implementation is incomplete in the sense that it
only recognizes some functional dependencies. To recognize the sort of
thing you show, you would need some kind of complex deduction or proof
engine, and that doesn't seem worthwhile, at least for me, at this
point.

The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT "it was pretty easy"), I don't
want to do it.

As far as I recall, at least 99% of the user requests for this type
of behavior, maybe 100%, would be satisfied by recognizing the
group-by-primary-key case. So I think we should do that and be happy.

regards, tom lane

#14

Marko Tiikkaja

marko.tiikkaja@cs.helsinki.fi

over 15 years ago

In reply to: Tom Lane (#13)

Re: Functional dependencies and GROUP BY

On 6/8/10 5:21 PM +0300, Tom Lane wrote:

Peter Eisentraut<peter_e@gmx.net> writes:

On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

Yes, as I said, my implementation is incomplete in the sense that it
only recognizes some functional dependencies. To recognize the sort of
thing you show, you would need some kind of complex deduction or proof
engine, and that doesn't seem worthwhile, at least for me, at this
point.

The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT "it was pretty easy"), I don't
want to do it.

As far as I recall, at least 99% of the user requests for this type
of behavior, maybe 100%, would be satisfied by recognizing the
group-by-primary-key case. So I think we should do that and be happy.

Regards,
Marko Tiikkaja

#15

Stephen Frost

sfrost@snowman.net

over 15 years ago

In reply to: Tom Lane (#11)

Re: Functional dependencies and GROUP BY

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Perhaps the correct fix would be to mark stored query trees as having a
dependency on the index, so that dropping the index/constraint would
force a drop of the rule too. Just pushing the check to plan time, as
I suggested yesterday, isn't a very nice fix because it would result
in the rule unexpectedly starting to fail at execution.

Alternatively, we could rewrite the rule (not unlike what we do for
"SELECT *") to actually add on the other implicitly grouped-by columns..
I don't know if that's better or worse than creating a dependency,
since if the constraint were dropped/changed, people might expect the
rule's output to change. Of course, as you mention, the alternative
would really be for the rule to just start failing.. Still, if I wanted
to change the constraint, it'd be alot nicer to just be able to change
it and, presuming I'm just adding a column to it or doing some other
change which wouldn't invalidate the rule, not have to drop/recreate
the rule.

Thanks,

Stephen

#16

Greg Stark

gsstark@mit.edu

over 15 years ago

In reply to: Tom Lane (#13)

Re: Functional dependencies and GROUP BY

On Tue, Jun 8, 2010 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT "it was pretty easy"), I don't
want to do it.

Well it does appear to be there:

4.18.11 Known functional dependencies in the result of a <where clause>
...
If AP is an equality AND-component of the <search condition> simply
contained in the <where clause> and one comparand of AP is a column
reference CR, and the other comparand of AP is a <literal>, then let
CRC be the counterpart of CR in R. {} {CRC} is a known functional
dependency in R, where {} denotes the empty set.

NOTE 43 — An SQL-implementation may also choose to recognize {}
{CRC} as a known functional dependency if the other comparand is a
deterministic expression containing no column references.
...

Since Peter's not eager to implement the whole section -- which does
seem pretty baroque -- it's up to us to draw the line where we stop
coding and declare it good enough. I think we're all agreed that
grouping by a pk is clearly the most important case. It may be
important to get some other cases just so that the PK property carries
through other clauses such as joins and group bys. But ultimately the
only thing stopping us from implementing the whole thing is our
threshold of pain for writing and maintaining the extra code.

--
greg

#17

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Stephen Frost (#15)

Re: Functional dependencies and GROUP BY

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Perhaps the correct fix would be to mark stored query trees as having a
dependency on the index, so that dropping the index/constraint would
force a drop of the rule too.

Alternatively, we could rewrite the rule (not unlike what we do for
"SELECT *") to actually add on the other implicitly grouped-by columns..
I don't know if that's better or worse than creating a dependency,
since if the constraint were dropped/changed, people might expect the
rule's output to change.

Hm. The problem with that is that one of the benefits we'd like to get
from this is an efficiency win: the generated plan ought to only group
by the PK, not uselessly sort/group by everything in the row. I suppose
we could have the planner reverse-engineer its way to that, but it seems
awfully slow and clunky to add on the extra columns and then reason our
way to removing them again.

regards, tom lane

#18

Stephen Frost

sfrost@snowman.net

over 15 years ago

In reply to: Tom Lane (#17)

Re: Functional dependencies and GROUP BY

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Hm. The problem with that is that one of the benefits we'd like to get
from this is an efficiency win: the generated plan ought to only group
by the PK, not uselessly sort/group by everything in the row. I suppose
we could have the planner reverse-engineer its way to that, but it seems
awfully slow and clunky to add on the extra columns and then reason our
way to removing them again.

That's certainly a good point. Another issue that I realized when
thinking about this again- if someone wanted to *drop* a column that's
part of a PK (since it turned out to not be necessary, for example), and
then wanted to recreate the rule based on what was stored in the
catalog, they wouldn't be able to without modifying it, and that's
certainly be annoying too.

Guess my 2c would be for creating the dependency. I really dislike the
idea of the rule just all of a sudden breaking.

Thanks,

Stephen

#19

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Tom Lane (#13)

Re: Functional dependencies and GROUP BY

On tis, 2010-06-08 at 10:21 -0400, Tom Lane wrote:

The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT "it was pretty easy"), I don't
want to do it.

Yeah, I'm not horribly attached to it. I began to structure the code to
support multiple kinds of checks, and at the end only two kinds were
reasonably doable and useful. We can remove it if no one is interested
in it, which appears to be the case.

#20

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Tom Lane (#11)

Re: Functional dependencies and GROUP BY

On tis, 2010-06-08 at 10:05 -0400, Tom Lane wrote:

Perhaps the correct fix would be to mark stored query trees as having
a
dependency on the index, so that dropping the index/constraint would
force a drop of the rule too. Just pushing the check to plan time, as
I suggested yesterday, isn't a very nice fix because it would result
in the rule unexpectedly starting to fail at execution.

There are actually pretty explicit instructions about this in the SQL
standard:

4) If QS is a <query specification> that contains an implicit or
explicit <group by clause> and that contains a column reference to a
column C in its <select list> that is not contained in an aggregated
argument of a <set function specification>, and if G is the set of
grouping columns of QS, and if the table constraint TC is needed to
conclude that G ↦ C is a known functional dependency in QS, then QS is
said to be dependent on TC.

5) If V is a view that contains a <query specification> that is
dependent on a table constraint TC, then V is said to be dependent on
TC.

So the dependency between the view/rule and the constraint/index needs
to be stored in the dependency system, and RESTRICT/CASCADE will take
effect.

#21

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Peter Eisentraut (#1)

1 attachment(s)

Re: Functional dependencies and GROUP BY

On mån, 2010-06-07 at 21:33 +0300, Peter Eisentraut wrote:

I have developed a patch that partially implements the "functional
dependency" feature that allows some columns to be omitted from the
GROUP BY clause if it can be shown that the columns are functionally
dependent on the columns in the group by clause and therefore
guaranteed to be unique per group.

Second version:

I stripped out all checks except the primary key/unique constraint
checks.

Views whose existence depends on one of those constraints get a
dependency recorded. This depends on the patch currently in the commit
fest to record not null constraints in pg_constraint, so that the
dependencies on not-null constraints can be recorded.

I haven't done any caching of index lookups yet. Some testing with
1600-column tables didn't show any effect. I'll test this a little
more.

Attachments:

functional-deps.patchtext/x-patch; charset=UTF-8; name=functional-deps.patchDownload

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index d0c41ce..e40cc4c 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
     In this example, the columns <literal>product_id</literal>,
     <literal>p.name</literal>, and <literal>p.price</literal> must be
     in the <literal>GROUP BY</> clause since they are referenced in
-    the query select list.  (Depending on how the products
-    table is set up, name and price might be fully dependent on the
-    product ID, so the additional groupings could theoretically be
-    unnecessary, though this is not implemented.)  The column
+    the query select list (but see below).  The column
     <literal>s.units</> does not have to be in the <literal>GROUP
     BY</> list since it is only used in an aggregate expression
     (<literal>sum(...)</literal>), which represents the sales
@@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
    </para>
 
    <para>
+    If the products table is set up so that,
+    say, <literal>product_id</literal> is the primary key or a
+    not-null unique constraint, then it would be enough to group
+    by <literal>product_id</literal> in the above example, since name
+    and price would be <firstterm>functionally
+    dependent</firstterm><indexterm><primary>functional
+    dependency</primary></indexterm> on the product ID, and so there
+    would be no ambiguity about which name and price value to return
+    for each product ID group.
+   </para>
+
+   <para>
     In strict SQL, <literal>GROUP BY</> can only group by columns of
     the source table but <productname>PostgreSQL</productname> extends
     this to also allow <literal>GROUP BY</> to group by columns in the
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 74021e8..1d02472 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -520,8 +520,12 @@ GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
     produces a single value computed across all the selected rows).
     When <literal>GROUP BY</literal> is present, it is not valid for
     the <command>SELECT</command> list expressions to refer to
-    ungrouped columns except within aggregate functions, since there
-    would be more than one possible value to return for an ungrouped
+    ungrouped columns except within aggregate functions or if the
+    ungrouped column is functionally dependent on the grouped columns,
+    since there would otherwise be more than one possible value to
+    return for an ungrouped column.  A functional dependency exists if
+    the grouped columns (or a subset thereof) are the primary key or a
+    not-null unique constraint of the table containing the ungrouped
     column.
    </para>
   </refsect2>
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index d7a06bc..dae51ea 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam.h"
 #include "access/xact.h"
+#include "catalog/dependency.h"
 #include "catalog/namespace.h"
 #include "commands/defrem.h"
 #include "commands/tablecmds.h"
@@ -474,6 +475,27 @@ DefineView(ViewStmt *stmt, const char *queryString)
 	 */
 	CommandCounterIncrement();
 
+	if (list_length(viewParse->dependencies) > 0)
+	{
+		ObjectAddress myself, referenced;
+		ListCell *lc;
+
+		myself.classId = RelationRelationId;
+		myself.objectId = viewOid;
+		myself.objectSubId = 0;
+
+		foreach(lc, viewParse->dependencies)
+		{
+			Oid index_relid = lfirst_oid(lc);
+
+			referenced.classId = RelationRelationId;
+			referenced.objectId = index_relid;
+			referenced.objectSubId = 0;
+
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+		}
+	}
+
 	/*
 	 * The range table of 'viewParse' does not contain entries for the "OLD"
 	 * and "NEW" relations. So... add them!
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e770e89..948df43 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2253,6 +2253,7 @@ _copyQuery(Query *from)
 	COPY_NODE_FIELD(limitCount);
 	COPY_NODE_FIELD(rowMarks);
 	COPY_NODE_FIELD(setOperations);
+	COPY_NODE_FIELD(dependencies);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 5d83727..98a1249 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -877,6 +877,7 @@ _equalQuery(Query *a, Query *b)
 	COMPARE_NODE_FIELD(limitCount);
 	COMPARE_NODE_FIELD(rowMarks);
 	COMPARE_NODE_FIELD(setOperations);
+	COMPARE_NODE_FIELD(dependencies);
 
 	return true;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e7dae4b..ff6d0ad 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2006,6 +2006,7 @@ _outQuery(StringInfo str, Query *node)
 	WRITE_NODE_FIELD(limitCount);
 	WRITE_NODE_FIELD(rowMarks);
 	WRITE_NODE_FIELD(setOperations);
+	WRITE_NODE_FIELD(dependencies);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bc6e2a6..79edcce 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -218,6 +218,7 @@ _readQuery(void)
 	READ_NODE_FIELD(limitCount);
 	READ_NODE_FIELD(rowMarks);
 	READ_NODE_FIELD(setOperations);
+	READ_NODE_FIELD(dependencies);
 
 	READ_DONE();
 }
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 0a69bde..3f66888 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -14,6 +14,8 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
+#include "catalog/pg_index.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/tlist.h"
@@ -24,20 +26,23 @@
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 
 
 typedef struct
 {
 	ParseState *pstate;
+	Query	   *qry;
 	List	   *groupClauses;
 	bool		have_non_var_grouping;
 	int			sublevels_up;
 } check_ungrouped_columns_context;
 
-static void check_ungrouped_columns(Node *node, ParseState *pstate,
+static void check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
 						List *groupClauses, bool have_non_var_grouping);
 static bool check_ungrouped_columns_walker(Node *node,
 							   check_ungrouped_columns_context *context);
+static bool funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid);
 
 
 /*
@@ -408,13 +413,13 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
 	clause = (Node *) qry->targetList;
 	if (hasJoinRTEs)
 		clause = flatten_join_alias_vars(root, clause);
-	check_ungrouped_columns(clause, pstate,
+	check_ungrouped_columns(clause, pstate, qry,
 							groupClauses, have_non_var_grouping);
 
 	clause = (Node *) qry->havingQual;
 	if (hasJoinRTEs)
 		clause = flatten_join_alias_vars(root, clause);
-	check_ungrouped_columns(clause, pstate,
+	check_ungrouped_columns(clause, pstate, qry,
 							groupClauses, have_non_var_grouping);
 
 	/*
@@ -535,12 +540,13 @@ parseCheckWindowFuncs(ParseState *pstate, Query *qry)
  * way more pain than the feature seems worth.
  */
 static void
-check_ungrouped_columns(Node *node, ParseState *pstate,
+check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
 						List *groupClauses, bool have_non_var_grouping)
 {
 	check_ungrouped_columns_context context;
 
 	context.pstate = pstate;
+	context.qry = qry;
 	context.groupClauses = groupClauses;
 	context.have_non_var_grouping = have_non_var_grouping;
 	context.sublevels_up = 0;
@@ -617,6 +623,19 @@ check_ungrouped_columns_walker(Node *node,
 					gvar->varlevelsup == 0)
 					return false;		/* acceptable, we're okay */
 			}
+
+			/* Check whether primary key of var's table is subset of group clauses. */
+			rte = rt_fetch(var->varno, context->pstate->p_rtable);
+			if (rte->rtekind == RTE_RELATION)
+			{
+				Oid			index_relid;
+
+				if (funcdeps_check_pk(context->groupClauses, rte->relid, var->varno, &index_relid))
+				{
+					context->qry->dependencies = lappend_oid(context->qry->dependencies, index_relid);
+					return false;
+				}
+			}
 		}
 
 		/* Found an ungrouped local variable; generate error message */
@@ -656,6 +675,100 @@ check_ungrouped_columns_walker(Node *node,
 }
 
 /*
+ * Check whether the attributes of the primary key or a not-null
+ * unique constraint of relid with range table index rteno appear as a
+ * subset of the group_clauses.  (If so, a functional dependency
+ * exists between the group clauses and any attribute of the relation,
+ * and so attributes of the relation can appear ungrouped.)
+ */
+static bool
+funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid)
+{
+	Relation	rel;
+	ListCell   *indexoidcell;
+
+	rel = heap_open(relid, AccessShareLock);
+
+	foreach(indexoidcell, RelationGetIndexList(rel))
+	{
+		Oid         indexoid = lfirst_oid(indexoidcell);
+		HeapTuple   indexTuple;
+		Form_pg_index indexStruct;
+		int         i;
+		bool		found_col;
+		bool		found_all_cols;
+
+		indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexoid));
+		if (!HeapTupleIsValid(indexTuple))
+			elog(ERROR, "cache lookup failed for index %u", indexoid);
+		indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
+
+		if ((!indexStruct->indisunique && !indexStruct->indisprimary)
+			|| !indexStruct->indimmediate)
+			continue;
+
+		/*
+		 * Check that the group columns are a superset of the
+		 * primary key columns.
+		 */
+		for (i = 0; i < indexStruct->indnatts; i++)
+		{
+			HeapTuple       tp;
+			int2			attnum;
+			ListCell	   *gl;
+
+			attnum = indexStruct->indkey.values[i];
+			found_col = false;
+
+			tp = SearchSysCache2(ATTNUM,
+								 ObjectIdGetDatum(relid),
+								 Int16GetDatum(attnum));
+			if (HeapTupleIsValid(tp))
+			{
+				Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+				bool        attnotnull;
+
+				attnotnull = att_tup->attnotnull;
+				ReleaseSysCache(tp);
+				if (!attnotnull)
+					break;
+			}
+			else
+				break;
+
+			foreach(gl, group_clauses)
+			{
+				Var		   *gvar = (Var *) lfirst(gl);
+
+				if (IsA(gvar, Var) &&
+					gvar->varno == rteno &&
+					gvar->varattno == attnum &&
+					gvar->varlevelsup == 0)
+				{
+					found_col = true;
+					break;
+				}
+			}
+			if (!found_col)
+				break;
+		}
+		found_all_cols = (i == indexStruct->indnatts && found_col);
+
+		ReleaseSysCache(indexTuple);
+		if (found_all_cols)
+		{
+			heap_close(rel, NoLock);
+			*index_relid = indexoid;
+			return true;
+		}
+	}
+
+	heap_close(rel, NoLock);
+
+	return false;
+}
+
+/*
  * Create expression trees for the transition and final functions
  * of an aggregate.  These are needed so that polymorphic functions
  * can be used within an aggregate --- without the expression trees,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b591073..746f4ca 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -146,6 +146,9 @@ typedef struct Query
 
 	Node	   *setOperations;	/* set-operation tree if this is top level of
 								 * a UNION/INTERSECT/EXCEPT query */
+	List	   *dependencies;	/* list of index OIDs that are
+								 * required for the query to be
+								 * valid */
 } Query;
 
 
diff --git a/src/test/regress/expected/functional_deps.out b/src/test/regress/expected/functional_deps.out
new file mode 100644
index 0000000..87db5cb
--- /dev/null
+++ b/src/test/regress/expected/functional_deps.out
@@ -0,0 +1,192 @@
+-- from http://www.depesz.com/index.php/2010/04/19/getting-unique-elements/
+CREATE TABLE articles (
+    id int CONSTRAINT articles_pkey PRIMARY KEY,
+    keywords text,
+    title text UNIQUE NOT NULL,
+    body text UNIQUE,
+    created date
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "articles_pkey" for table "articles"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "articles_title_key" for table "articles"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "articles_body_key" for table "articles"
+CREATE TABLE articles_in_category (
+    article_id int,
+    category_id int,
+    changed date,
+    PRIMARY KEY (article_id, category_id)
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "articles_in_category_pkey" for table "articles_in_category"
+-- test functional dependencies based on primary keys/unique constraints
+-- base tables
+-- group by primary key (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by unique not null (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY title;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by unique nullable (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT id, keywords, title, body, created
+               ^
+-- group by something else (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY keywords;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT id, keywords, title, body, created
+               ^
+-- multiple tables
+-- group by primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+ERROR:  column "a.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.id, a.keywords, a.title, a.body, a.created
+               ^
+-- JOIN syntax
+-- group by left table's primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+ERROR:  column "a.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.id, a.keywords, a.title, a.body, a.created
+               ^
+-- group by right table's (composite) primary key (OK)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.category_id, aic.article_id;
+ changed 
+---------
+(0 rows)
+
+-- group by right table's partial primary key (fail)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id;
+ERROR:  column "aic.changed" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT aic.changed
+               ^
+-- example from documentation
+CREATE TABLE products (product_id int, name text, price numeric);
+CREATE TABLE sales (product_id int, units int);
+-- OK
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id, p.name, p.price;
+ product_id | name | sales 
+------------+------+-------
+(0 rows)
+
+-- fail
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+ERROR:  column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+                           ^
+ALTER TABLE products ADD PRIMARY KEY (product_id);
+NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "products_pkey" for table "products"
+-- OK now
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+ product_id | name | sales 
+------------+------+-------
+(0 rows)
+
+-- Drupal example, http://drupal.org/node/555530
+CREATE TABLE node (
+    nid SERIAL,
+    vid integer NOT NULL default '0',
+    type varchar(32) NOT NULL default '',
+    title varchar(128) NOT NULL default '',
+    uid integer NOT NULL default '0',
+    status integer NOT NULL default '1',
+    created integer NOT NULL default '0',
+    -- snip
+    PRIMARY KEY (nid, vid)
+);
+NOTICE:  CREATE TABLE will create implicit sequence "node_nid_seq" for serial column "node.nid"
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "node_pkey" for table "node"
+CREATE TABLE users (
+    uid integer NOT NULL default '0',
+    name varchar(60) NOT NULL default '',
+    pass varchar(32) NOT NULL default '',
+    -- snip
+    PRIMARY KEY (uid),
+    UNIQUE (name)
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "users_pkey" for table "users"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "users_name_key" for table "users"
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid, u.name;
+ uid | name 
+-----+------
+(0 rows)
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid;
+ uid | name 
+-----+------
+(0 rows)
+
+-- views and dependencies
+-- fail
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 2: SELECT id, keywords, title, body, created
+               ^
+-- OK
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+-- fail
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
+ERROR:  cannot drop constraint articles_pkey on table articles because other objects depend on it
+DETAIL:  view fdv1 depends on index articles_pkey
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 7529777..191d1fe 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -84,7 +84,7 @@ test: rules
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap
+test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5f185f9..e38d5f0 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -76,6 +76,7 @@ test: union
 test: case
 test: join
 test: aggregates
+test: functional_deps
 test: transactions
 ignore: random
 test: random
diff --git a/src/test/regress/sql/functional_deps.sql b/src/test/regress/sql/functional_deps.sql
new file mode 100644
index 0000000..5f0aaf8
--- /dev/null
+++ b/src/test/regress/sql/functional_deps.sql
@@ -0,0 +1,157 @@
+-- from http://www.depesz.com/index.php/2010/04/19/getting-unique-elements/
+
+CREATE TABLE articles (
+    id int CONSTRAINT articles_pkey PRIMARY KEY,
+    keywords text,
+    title text UNIQUE NOT NULL,
+    body text UNIQUE,
+    created date
+);
+
+CREATE TABLE articles_in_category (
+    article_id int,
+    category_id int,
+    changed date,
+    PRIMARY KEY (article_id, category_id)
+);
+
+-- test functional dependencies based on primary keys/unique constraints
+
+-- base tables
+
+-- group by primary key (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+
+-- group by unique not null (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY title;
+
+-- group by unique nullable (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+
+-- group by something else (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY keywords;
+
+-- multiple tables
+
+-- group by primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+
+-- JOIN syntax
+
+-- group by left table's primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+
+-- group by right table's (composite) primary key (OK)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.category_id, aic.article_id;
+
+-- group by right table's partial primary key (fail)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id;
+
+
+-- example from documentation
+
+CREATE TABLE products (product_id int, name text, price numeric);
+CREATE TABLE sales (product_id int, units int);
+
+-- OK
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id, p.name, p.price;
+
+-- fail
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+
+ALTER TABLE products ADD PRIMARY KEY (product_id);
+
+-- OK now
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+
+
+-- Drupal example, http://drupal.org/node/555530
+
+CREATE TABLE node (
+    nid SERIAL,
+    vid integer NOT NULL default '0',
+    type varchar(32) NOT NULL default '',
+    title varchar(128) NOT NULL default '',
+    uid integer NOT NULL default '0',
+    status integer NOT NULL default '1',
+    created integer NOT NULL default '0',
+    -- snip
+    PRIMARY KEY (nid, vid)
+);
+
+CREATE TABLE users (
+    uid integer NOT NULL default '0',
+    name varchar(60) NOT NULL default '',
+    pass varchar(32) NOT NULL default '',
+    -- snip
+    PRIMARY KEY (uid),
+    UNIQUE (name)
+);
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid, u.name;
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid;
+
+
+-- views and dependencies
+
+-- fail
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+
+-- OK
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+
+-- fail
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;

#22

Alex Hunsaker

badalex@gmail.com

over 15 years ago

In reply to: Peter Eisentraut (#21)

Re: Functional dependencies and GROUP BY

On Fri, Jun 25, 2010 at 14:06, Peter Eisentraut <peter_e@gmx.net> wrote:

Second version:

Hi!

Ive looked this over. Looks great! I have some nits about the
documentation and comments ( non issues like referencing primary keys
when it really means not null unique indexes :-P ), but on the whole
it works and looks good.

The only corner case I have run into is creating a view with what I
would call an implicit 'not null' constraint. Demonstration below:

create table nn (a int4 not null, b int4, unique (a));
select * from nn group by a; -- should this work? I think not?
a | b
---+---
(0 rows)

create view vv as select a, b from nn group by a;
select * from vv;
a | b
---+---
(0 rows)

=# alter table nn alter column a drop not null;
=# select * from nn group by a; -- ok, broken makes sense
ERROR: column "nn.b" must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT * from nn group by a;

=# select * from vv; -- yipes should be broken?
a | b
---+---
(0 rows)

Im thinking we should not allow the "select * from nn group by a;" to
work. Thoughts?

(FYI I do plan on doing some performance testing with large columns
later, any other requests?)

#23

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Alex Hunsaker (#22)

Re: Functional dependencies and GROUP BY

On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:

The only corner case I have run into is creating a view with what I
would call an implicit 'not null' constraint. Demonstration below:

create table nn (a int4 not null, b int4, unique (a));
select * from nn group by a; -- should this work? I think not?

I believe I referred to this upsthread. There is another patch in the
commitfest about explicitly representing NOT NULL constraints in
pg_constraint. Then this case would create a dependency on those
constraints. So we need to get that other patch in first.

#24

Alex Hunsaker

badalex@gmail.com

over 15 years ago

In reply to: Peter Eisentraut (#23)

Re: Functional dependencies and GROUP BY

On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut <peter_e@gmx.net> wrote:

On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:

The only corner case I have run into is creating a view with what I
would call an implicit 'not null' constraint. Demonstration below:

create table nn (a int4 not null, b int4, unique (a));
select * from nn group by a; -- should this work? I think not?

I believe I referred to this upsthread.

Aww, and here I thought I had just been diligent :). In other news
its really no surprise that your test with 1600 columns had little
effect. As it loops over the the indexes, then the index keys and
then the group by items right? So I would expect the more indexes you
had or group by items to slow it down. Not so much the number of
columns. Right?

Anyhow it sounds like I should try it on top of the other patch and
see if it works. I assume it might still need some fixups to work
with that other patch? Or do you expect it to just work?

#25

Alex Hunsaker

badalex@gmail.com

over 15 years ago

In reply to: Alex Hunsaker (#22)

Re: Functional dependencies and GROUP BY

On Fri, Jul 16, 2010 at 22:29, Alex Hunsaker <badalex@gmail.com> wrote:

(FYI I do plan on doing some performance testing with large columns
later, any other requests?)

And here are the results. All tests are with an empty table with 1500
int4 columns. There is a unique non null index on the first column.

(non assert build)
A: select count(1) from (select * from test group by ...1500 columns...) as res;
B: select count(1) from (select * from test group by a_0) as res; --
a_0 has the not null unique index

CVS A: 360ms
PATCH A: 370ms
PATCH B: 60ms

1500 indexes (one per column, on the column):
CVS: A: 670ms
PATCH A: 850ms
PATCH B: 561ms

So it seems for tables with lots of columns the patch is faster, at
least when you omit all the columns from the group by. I suspect for
most "normal" (5-20 columns) usage it should be a wash.

(Stupid question) Does anyone know why HEAD is quite a bit slower when
there are lots off indexes? Do we end up looping and perhaps locking
them or something?

#26

Alex Hunsaker

badalex@gmail.com

over 15 years ago

In reply to: Alex Hunsaker (#24)

Re: Functional dependencies and GROUP BY

On Sat, Jul 17, 2010 at 11:13, Alex Hunsaker <badalex@gmail.com> wrote:

On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut <peter_e@gmx.net> wrote:

On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:

The only corner case I have run into is creating a view with what I
would call an implicit 'not null' constraint. Demonstration below:

create table nn (a int4 not null, b int4, unique (a));
select * from nn group by a; -- should this work? I think not?

I believe I referred to this upsthread.

Yeah, I went back and reread the thread and um... yep its right where
you posted patch 2. I think I read it, forgot about it and then it
bubbled up to my subconscious while testing :).

Anyhow it sounds like I should try it on top of the other patch and
see if it works. I assume it might still need some fixups to work
with that other patch? Or do you expect it to just work?

[ referring to the not null pg_constraint patch ]

Probably no surprise to you, I tried it on top of the not null
pg_constraint patch and it did not work 'out of the box'. Mainly I
was curious if there was some magic going on that I did not see. In
any event do you think its worth adding a regression test for this?

#27

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Alex Hunsaker (#24)

Re: Functional dependencies and GROUP BY

On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:

its really no surprise that your test with 1600 columns had little
effect. As it loops over the the indexes, then the index keys and
then the group by items right? So I would expect the more indexes you
had or group by items to slow it down. Not so much the number of
columns. Right?

At the outer level (which is not visible in this patch) it loops over
all columns in the select list, and then it looks up the indexes each
time. So the concern was that if the select list was * with a wide
table, looking up the indexes each time would be slow.

Anyhow it sounds like I should try it on top of the other patch and
see if it works. I assume it might still need some fixups to work
with that other patch? Or do you expect it to just work?

There is some work necessary to integrate the two.

#28

Alex Hunsaker

badalex@gmail.com

over 15 years ago

In reply to: Peter Eisentraut (#27)

1 attachment(s)

Re: Functional dependencies and GROUP BY

On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut <peter_e@gmx.net> wrote:

On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:

So I would expect the more indexes you
had or group by items to slow it down. Not so much the number of
columns. Right?

At the outer level (which is not visible in this patch) it loops over
all columns in the select list, and then it looks up the indexes each
time. So the concern was that if the select list was * with a wide
table, looking up the indexes each time would be slow.

Thanks for the explanation.

Anyhow it sounds like I should try it on top of the other patch and
see if it works. I assume it might still need some fixups to work
with that other patch? Or do you expect it to just work?

There is some work necessary to integrate the two.

I just read that patch is getting pushed till at least the next commit
fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php

Should we push this patch back to? Alternatively we could make it
work with just primary keys until the other patch gets in. I think
that makes sense, find that attached. Thoughts?

Note I axed the index not null attribute checking, I have not thought
to deeply about it other than if its a primary key it cant have non
null attributes.... Right? I may have missed something subtle hence
the heads up.

Attachments:

func_deps_v3.patchtext/x-patch; charset=US-ASCII; name=func_deps_v3.patchDownload

*** a/doc/src/sgml/queries.sgml
--- b/doc/src/sgml/queries.sgml
***************
*** 886,895 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
      In this example, the columns <literal>product_id</literal>,
      <literal>p.name</literal>, and <literal>p.price</literal> must be
      in the <literal>GROUP BY</> clause since they are referenced in
!     the query select list.  (Depending on how the products
!     table is set up, name and price might be fully dependent on the
!     product ID, so the additional groupings could theoretically be
!     unnecessary, though this is not implemented.)  The column
      <literal>s.units</> does not have to be in the <literal>GROUP
      BY</> list since it is only used in an aggregate expression
      (<literal>sum(...)</literal>), which represents the sales
--- 886,892 ----
      In this example, the columns <literal>product_id</literal>,
      <literal>p.name</literal>, and <literal>p.price</literal> must be
      in the <literal>GROUP BY</> clause since they are referenced in
!     the query select list (but see below).  The column
      <literal>s.units</> does not have to be in the <literal>GROUP
      BY</> list since it is only used in an aggregate expression
      (<literal>sum(...)</literal>), which represents the sales
***************
*** 898,903 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
--- 895,912 ----
     </para>
  
     <para>
+     If the products table is set up so that,
+     say, <literal>product_id</literal> is the primary key or a
+     not-null unique constraint, then it would be enough to group
+     by <literal>product_id</literal> in the above example, since name
+     and price would be <firstterm>functionally
+     dependent</firstterm><indexterm><primary>functional
+     dependency</primary></indexterm> on the product ID, and so there
+     would be no ambiguity about which name and price value to return
+     for each product ID group.
+    </para>
+ 
+    <para>
      In strict SQL, <literal>GROUP BY</> can only group by columns of
      the source table but <productname>PostgreSQL</productname> extends
      this to also allow <literal>GROUP BY</> to group by columns in the
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
***************
*** 520,527 **** GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
      produces a single value computed across all the selected rows).
      When <literal>GROUP BY</literal> is present, it is not valid for
      the <command>SELECT</command> list expressions to refer to
!     ungrouped columns except within aggregate functions, since there
!     would be more than one possible value to return for an ungrouped
      column.
     </para>
    </refsect2>
--- 520,531 ----
      produces a single value computed across all the selected rows).
      When <literal>GROUP BY</literal> is present, it is not valid for
      the <command>SELECT</command> list expressions to refer to
!     ungrouped columns except within aggregate functions or if the
!     ungrouped column is functionally dependent on the grouped columns,
!     since there would otherwise be more than one possible value to
!     return for an ungrouped column.  A functional dependency exists if
!     the grouped columns (or a subset thereof) are the primary key or a
!     not-null unique constraint of the table containing the ungrouped
      column.
     </para>
    </refsect2>
*** a/src/backend/commands/view.c
--- b/src/backend/commands/view.c
***************
*** 16,21 ****
--- 16,22 ----
  
  #include "access/heapam.h"
  #include "access/xact.h"
+ #include "catalog/dependency.h"
  #include "catalog/namespace.h"
  #include "commands/defrem.h"
  #include "commands/tablecmds.h"
***************
*** 474,479 **** DefineView(ViewStmt *stmt, const char *queryString)
--- 475,501 ----
  	 */
  	CommandCounterIncrement();
  
+ 	if (list_length(viewParse->dependencies) > 0)
+ 	{
+ 		ObjectAddress myself, referenced;
+ 		ListCell *lc;
+ 
+ 		myself.classId = RelationRelationId;
+ 		myself.objectId = viewOid;
+ 		myself.objectSubId = 0;
+ 
+ 		foreach(lc, viewParse->dependencies)
+ 		{
+ 			Oid index_relid = lfirst_oid(lc);
+ 
+ 			referenced.classId = RelationRelationId;
+ 			referenced.objectId = index_relid;
+ 			referenced.objectSubId = 0;
+ 
+ 			recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ 		}
+ 	}
+ 
  	/*
  	 * The range table of 'viewParse' does not contain entries for the "OLD"
  	 * and "NEW" relations. So... add them!
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***************
*** 2272,2277 **** _copyQuery(Query *from)
--- 2272,2278 ----
  	COPY_NODE_FIELD(limitCount);
  	COPY_NODE_FIELD(rowMarks);
  	COPY_NODE_FIELD(setOperations);
+ 	COPY_NODE_FIELD(dependencies);
  
  	return newnode;
  }
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***************
*** 877,882 **** _equalQuery(Query *a, Query *b)
--- 877,883 ----
  	COMPARE_NODE_FIELD(limitCount);
  	COMPARE_NODE_FIELD(rowMarks);
  	COMPARE_NODE_FIELD(setOperations);
+ 	COMPARE_NODE_FIELD(dependencies);
  
  	return true;
  }
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***************
*** 2019,2024 **** _outQuery(StringInfo str, Query *node)
--- 2019,2025 ----
  	WRITE_NODE_FIELD(limitCount);
  	WRITE_NODE_FIELD(rowMarks);
  	WRITE_NODE_FIELD(setOperations);
+ 	WRITE_NODE_FIELD(dependencies);
  }
  
  static void
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
***************
*** 218,223 **** _readQuery(void)
--- 218,224 ----
  	READ_NODE_FIELD(limitCount);
  	READ_NODE_FIELD(rowMarks);
  	READ_NODE_FIELD(setOperations);
+ 	READ_NODE_FIELD(dependencies);
  
  	READ_DONE();
  }
*** a/src/backend/parser/parse_agg.c
--- b/src/backend/parser/parse_agg.c
***************
*** 14,19 ****
--- 14,21 ----
   */
  #include "postgres.h"
  
+ #include "access/heapam.h"
+ #include "catalog/pg_index.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #include "optimizer/tlist.h"
***************
*** 24,43 ****
  #include "rewrite/rewriteManip.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  
  
  typedef struct
  {
  	ParseState *pstate;
  	List	   *groupClauses;
  	bool		have_non_var_grouping;
  	int			sublevels_up;
  } check_ungrouped_columns_context;
  
! static void check_ungrouped_columns(Node *node, ParseState *pstate,
  						List *groupClauses, bool have_non_var_grouping);
  static bool check_ungrouped_columns_walker(Node *node,
  							   check_ungrouped_columns_context *context);
  
  
  /*
--- 26,48 ----
  #include "rewrite/rewriteManip.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
+ #include "utils/syscache.h"
  
  
  typedef struct
  {
  	ParseState *pstate;
+ 	Query	   *qry;
  	List	   *groupClauses;
  	bool		have_non_var_grouping;
  	int			sublevels_up;
  } check_ungrouped_columns_context;
  
! static void check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
  						List *groupClauses, bool have_non_var_grouping);
  static bool check_ungrouped_columns_walker(Node *node,
  							   check_ungrouped_columns_context *context);
+ static bool funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid);
  
  
  /*
***************
*** 408,420 **** parseCheckAggregates(ParseState *pstate, Query *qry)
  	clause = (Node *) qry->targetList;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate,
  							groupClauses, have_non_var_grouping);
  
  	clause = (Node *) qry->havingQual;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate,
  							groupClauses, have_non_var_grouping);
  
  	/*
--- 413,425 ----
  	clause = (Node *) qry->targetList;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate, qry,
  							groupClauses, have_non_var_grouping);
  
  	clause = (Node *) qry->havingQual;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate, qry,
  							groupClauses, have_non_var_grouping);
  
  	/*
***************
*** 535,546 **** parseCheckWindowFuncs(ParseState *pstate, Query *qry)
   * way more pain than the feature seems worth.
   */
  static void
! check_ungrouped_columns(Node *node, ParseState *pstate,
  						List *groupClauses, bool have_non_var_grouping)
  {
  	check_ungrouped_columns_context context;
  
  	context.pstate = pstate;
  	context.groupClauses = groupClauses;
  	context.have_non_var_grouping = have_non_var_grouping;
  	context.sublevels_up = 0;
--- 540,552 ----
   * way more pain than the feature seems worth.
   */
  static void
! check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
  						List *groupClauses, bool have_non_var_grouping)
  {
  	check_ungrouped_columns_context context;
  
  	context.pstate = pstate;
+ 	context.qry = qry;
  	context.groupClauses = groupClauses;
  	context.have_non_var_grouping = have_non_var_grouping;
  	context.sublevels_up = 0;
***************
*** 617,622 **** check_ungrouped_columns_walker(Node *node,
--- 623,641 ----
  					gvar->varlevelsup == 0)
  					return false;		/* acceptable, we're okay */
  			}
+ 
+ 			/* Check whether primary key of var's table is subset of group clauses. */
+ 			rte = rt_fetch(var->varno, context->pstate->p_rtable);
+ 			if (rte->rtekind == RTE_RELATION)
+ 			{
+ 				Oid			index_relid;
+ 
+ 				if (funcdeps_check_pk(context->groupClauses, rte->relid, var->varno, &index_relid))
+ 				{
+ 					context->qry->dependencies = lappend_oid(context->qry->dependencies, index_relid);
+ 					return false;
+ 				}
+ 			}
  		}
  
  		/* Found an ungrouped local variable; generate error message */
***************
*** 656,661 **** check_ungrouped_columns_walker(Node *node,
--- 675,755 ----
  }
  
  /*
+  * Check whether the attributes of the primary key relid with range table index
+  * rteno appear as a subset of the group_clauses.  (If so, a functional
+  * dependency exists between the group clauses and any attribute of the
+  * relation, and so attributes of the relation can appear ungrouped.)
+  */
+ static bool
+ funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid)
+ {
+ 	Relation	rel;
+ 	ListCell   *indexoidcell;
+ 
+ 	rel = heap_open(relid, AccessShareLock);
+ 
+ 	foreach(indexoidcell, RelationGetIndexList(rel))
+ 	{
+ 		Oid         indexoid = lfirst_oid(indexoidcell);
+ 		HeapTuple   indexTuple;
+ 		Form_pg_index indexStruct;
+ 		int         i;
+ 		bool		found_col;
+ 		bool		found_all_cols;
+ 
+ 		indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexoid));
+ 		if (!HeapTupleIsValid(indexTuple))
+ 			elog(ERROR, "cache lookup failed for index %u", indexoid);
+ 		indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
+ 
+ 		if (!indexStruct->indisprimary || !indexStruct->indimmediate)
+ 			continue;
+ 
+ 		/*
+ 		 * Check that the group columns are a superset of the
+ 		 * primary key columns.
+ 		 */
+ 		for (i = 0; i < indexStruct->indnatts; i++)
+ 		{
+ 			int2			attnum;
+ 			ListCell	   *gl;
+ 
+ 			attnum = indexStruct->indkey.values[i];
+ 			found_col = false;
+ 
+ 			foreach(gl, group_clauses)
+ 			{
+ 				Var		   *gvar = (Var *) lfirst(gl);
+ 
+ 				if (IsA(gvar, Var) &&
+ 					gvar->varno == rteno &&
+ 					gvar->varattno == attnum &&
+ 					gvar->varlevelsup == 0)
+ 				{
+ 					found_col = true;
+ 					break;
+ 				}
+ 			}
+ 			if (!found_col)
+ 				break;
+ 		}
+ 		found_all_cols = (i == indexStruct->indnatts && found_col);
+ 
+ 		ReleaseSysCache(indexTuple);
+ 		if (found_all_cols)
+ 		{
+ 			heap_close(rel, NoLock);
+ 			*index_relid = indexoid;
+ 			return true;
+ 		}
+ 	}
+ 
+ 	heap_close(rel, NoLock);
+ 
+ 	return false;
+ }
+ 
+ /*
   * Create expression trees for the transition and final functions
   * of an aggregate.  These are needed so that polymorphic functions
   * can be used within an aggregate --- without the expression trees,
*** a/src/include/nodes/parsenodes.h
--- b/src/include/nodes/parsenodes.h
***************
*** 146,151 **** typedef struct Query
--- 146,154 ----
  
  	Node	   *setOperations;	/* set-operation tree if this is top level of
  								 * a UNION/INTERSECT/EXCEPT query */
+ 	List	   *dependencies;	/* list of index OIDs that are
+ 								 * required for the query to be
+ 								 * valid */
  } Query;
  
  
*** a/src/test/regress/parallel_schedule
--- b/src/test/regress/parallel_schedule
***************
*** 84,90 **** test: rules
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap
  
  # ----------
  # Another group of parallel tests
--- 84,90 ----
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps
  
  # ----------
  # Another group of parallel tests
*** a/src/test/regress/serial_schedule
--- b/src/test/regress/serial_schedule
***************
*** 76,81 **** test: union
--- 76,82 ----
  test: case
  test: join
  test: aggregates
+ test: functional_deps
  test: transactions
  ignore: random
  test: random

#29

Alex Hunsaker

badalex@gmail.com

over 15 years ago

In reply to: Alex Hunsaker (#28)

1 attachment(s)

Re: Functional dependencies and GROUP BY

On Fri, Jul 23, 2010 at 11:00, Alex Hunsaker <badalex@gmail.com> wrote:

Alternatively we could make it
work with just primary keys until the other patch gets in. I think
that makes sense, find that attached. Thoughts?

*sigh*, find attached a version that removes talk of unique not null
constraints from the docs

#30

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Alex Hunsaker (#28)

Re: Functional dependencies and GROUP BY

On fre, 2010-07-23 at 11:00 -0600, Alex Hunsaker wrote:

I just read that patch is getting pushed till at least the next commit
fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php

Should we push this patch back to? Alternatively we could make it
work with just primary keys until the other patch gets in. I think
that makes sense, find that attached. Thoughts?

I was thinking the same thing.

Note I axed the index not null attribute checking, I have not thought
to deeply about it other than if its a primary key it cant have non
null attributes.... Right? I may have missed something subtle hence
the heads up.

Another open question I thought of was whether we should put the
dependency record on the pg_index row, or the pg_constraint row, or
perhaps the pg_class row. Right now, it is using pg_index, because that
was easiest to code up, but I suspect that once we have not-null
constraints in pg_constraint, it will be more consistent to make all
dependencies go against pg_constraint rather than a mix of several
catalogs.

#31

Alex Hunsaker

badalex@gmail.com

over 15 years ago

In reply to: Peter Eisentraut (#30)

Re: Functional dependencies and GROUP BY

On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut <peter_e@gmx.net> wrote:

Another open question I thought of was whether we should put the
dependency record on the pg_index row, or the pg_constraint row, or
perhaps the pg_class row. Right now, it is using pg_index, because that
was easiest to code up, but I suspect that once we have not-null
constraints in pg_constraint, it will be more consistent to make all
dependencies go against pg_constraint rather than a mix of several
catalogs.

I think for primary keys pg_index is OK. However for the not-null
case we have to use pg_constraint... So given that we end up having to
code that anyways, it seems like it will end up being
cleaner/consistent to always use the pg_constraint row(s). So +1 for
using pg_constraint instead of pg_index from me.

#32

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Alex Hunsaker (#31)

1 attachment(s)

Re: Functional dependencies and GROUP BY

On mån, 2010-07-26 at 10:46 -0600, Alex Hunsaker wrote:

On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut <peter_e@gmx.net> wrote:

Another open question I thought of was whether we should put the
dependency record on the pg_index row, or the pg_constraint row, or
perhaps the pg_class row. Right now, it is using pg_index, because that
was easiest to code up, but I suspect that once we have not-null
constraints in pg_constraint, it will be more consistent to make all
dependencies go against pg_constraint rather than a mix of several
catalogs.

I think for primary keys pg_index is OK. However for the not-null
case we have to use pg_constraint... So given that we end up having to
code that anyways, it seems like it will end up being
cleaner/consistent to always use the pg_constraint row(s). So +1 for
using pg_constraint instead of pg_index from me.

Next version. Changed dependencies to pg_constraint, removed handling
of unique constraints for now, and made some enhancements so that views
track dependencies on constraints even in subqueries. Should be close
to final now. :-)

Attachments:

functional-deps.patchtext/x-patch; charset=UTF-8; name=functional-deps.patchDownload

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 416e599..a103229 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
     In this example, the columns <literal>product_id</literal>,
     <literal>p.name</literal>, and <literal>p.price</literal> must be
     in the <literal>GROUP BY</> clause since they are referenced in
-    the query select list.  (Depending on how the products
-    table is set up, name and price might be fully dependent on the
-    product ID, so the additional groupings could theoretically be
-    unnecessary, though this is not implemented.)  The column
+    the query select list (but see below).  The column
     <literal>s.units</> does not have to be in the <literal>GROUP
     BY</> list since it is only used in an aggregate expression
     (<literal>sum(...)</literal>), which represents the sales
@@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
    </para>
 
    <para>
+    If the products table is set up so that,
+    say, <literal>product_id</literal> is the primary key, then it
+    would be enough to group by <literal>product_id</literal> in the
+    above example, since name and price would
+    be <firstterm>functionally
+    dependent</firstterm><indexterm><primary>functional
+    dependency</primary></indexterm> on the product ID, and so there
+    would be no ambiguity about which name and price value to return
+    for each product ID group.
+   </para>
+
+   <para>
     In strict SQL, <literal>GROUP BY</> can only group by columns of
     the source table but <productname>PostgreSQL</productname> extends
     this to also allow <literal>GROUP BY</> to group by columns in the
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 74021e8..8436d85 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -520,9 +520,12 @@ GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
     produces a single value computed across all the selected rows).
     When <literal>GROUP BY</literal> is present, it is not valid for
     the <command>SELECT</command> list expressions to refer to
-    ungrouped columns except within aggregate functions, since there
-    would be more than one possible value to return for an ungrouped
-    column.
+    ungrouped columns except within aggregate functions or if the
+    ungrouped column is functionally dependent on the grouped columns,
+    since there would otherwise be more than one possible value to
+    return for an ungrouped column.  A functional dependency exists if
+    the grouped columns (or a subset thereof) are the primary key of
+    the table containing the ungrouped column.
    </para>
   </refsect2>
 
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 67c90a2..455fc6e 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -16,7 +16,9 @@
 
 #include "access/heapam.h"
 #include "access/xact.h"
+#include "catalog/dependency.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_constraint.h"
 #include "commands/defrem.h"
 #include "commands/tablecmds.h"
 #include "commands/view.h"
@@ -390,6 +392,28 @@ UpdateRangeTableOfViewParse(Oid viewOid, Query *viewParse)
 }
 
 /*
+ * Walker to collect the constraintDeps fields of all Query nodes in a
+ * query tree into a single list.
+ */
+static bool
+collectConstraintDeps_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	else if (IsA(node, Query))
+	{
+		Query *query = (Query *) node;
+		List **listp = (List **) context;
+
+		*listp = list_concat_unique_oid(*listp, query->constraintDeps);
+
+		return query_tree_walker(query, collectConstraintDeps_walker, context, 0);
+	}
+	else
+		return expression_tree_walker(node, collectConstraintDeps_walker, context);
+}
+
+/*
  * DefineView
  *		Execute a CREATE VIEW command.
  */
@@ -399,6 +423,7 @@ DefineView(ViewStmt *stmt, const char *queryString)
 	Query	   *viewParse;
 	Oid			viewOid;
 	RangeVar   *view;
+	List	   *allConstraintDeps;
 
 	/*
 	 * Run parse analysis to convert the raw parse tree to a Query.  Note this
@@ -479,6 +504,30 @@ DefineView(ViewStmt *stmt, const char *queryString)
 	 */
 	CommandCounterIncrement();
 
+	allConstraintDeps = NIL;
+	collectConstraintDeps_walker((Node *) viewParse, &allConstraintDeps);
+
+	if (list_length(allConstraintDeps) > 0)
+	{
+		ObjectAddress myself, referenced;
+		ListCell *lc;
+
+		myself.classId = RelationRelationId;
+		myself.objectId = viewOid;
+		myself.objectSubId = 0;
+
+		foreach(lc, allConstraintDeps)
+		{
+			Oid constraint_oid = lfirst_oid(lc);
+
+			referenced.classId = ConstraintRelationId;
+			referenced.objectId = constraint_oid;
+			referenced.objectSubId = 0;
+
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+		}
+	}
+
 	/*
 	 * The range table of 'viewParse' does not contain entries for the "OLD"
 	 * and "NEW" relations. So... add them!
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9af1217..69262d6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2272,6 +2272,7 @@ _copyQuery(Query *from)
 	COPY_NODE_FIELD(limitCount);
 	COPY_NODE_FIELD(rowMarks);
 	COPY_NODE_FIELD(setOperations);
+	COPY_NODE_FIELD(constraintDeps);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 70b3c62..667057b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -877,6 +877,7 @@ _equalQuery(Query *a, Query *b)
 	COMPARE_NODE_FIELD(limitCount);
 	COMPARE_NODE_FIELD(rowMarks);
 	COMPARE_NODE_FIELD(setOperations);
+	COMPARE_NODE_FIELD(constraintDeps);
 
 	return true;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0454aa5..04a6647 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2020,6 +2020,7 @@ _outQuery(StringInfo str, Query *node)
 	WRITE_NODE_FIELD(limitCount);
 	WRITE_NODE_FIELD(rowMarks);
 	WRITE_NODE_FIELD(setOperations);
+	WRITE_NODE_FIELD(constraintDeps);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bc6e2a6..0a2edcb 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -218,6 +218,7 @@ _readQuery(void)
 	READ_NODE_FIELD(limitCount);
 	READ_NODE_FIELD(rowMarks);
 	READ_NODE_FIELD(setOperations);
+	READ_NODE_FIELD(constraintDeps);
 
 	READ_DONE();
 }
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 0a69bde..6e60563 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -14,6 +14,11 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_index.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/tlist.h"
@@ -23,21 +28,26 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
 
 
 typedef struct
 {
 	ParseState *pstate;
+	Query	   *qry;
 	List	   *groupClauses;
 	bool		have_non_var_grouping;
 	int			sublevels_up;
 } check_ungrouped_columns_context;
 
-static void check_ungrouped_columns(Node *node, ParseState *pstate,
+static void check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
 						List *groupClauses, bool have_non_var_grouping);
 static bool check_ungrouped_columns_walker(Node *node,
 							   check_ungrouped_columns_context *context);
+static bool funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *constraint_oid);
 
 
 /*
@@ -408,13 +418,13 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
 	clause = (Node *) qry->targetList;
 	if (hasJoinRTEs)
 		clause = flatten_join_alias_vars(root, clause);
-	check_ungrouped_columns(clause, pstate,
+	check_ungrouped_columns(clause, pstate, qry,
 							groupClauses, have_non_var_grouping);
 
 	clause = (Node *) qry->havingQual;
 	if (hasJoinRTEs)
 		clause = flatten_join_alias_vars(root, clause);
-	check_ungrouped_columns(clause, pstate,
+	check_ungrouped_columns(clause, pstate, qry,
 							groupClauses, have_non_var_grouping);
 
 	/*
@@ -535,12 +545,13 @@ parseCheckWindowFuncs(ParseState *pstate, Query *qry)
  * way more pain than the feature seems worth.
  */
 static void
-check_ungrouped_columns(Node *node, ParseState *pstate,
+check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
 						List *groupClauses, bool have_non_var_grouping)
 {
 	check_ungrouped_columns_context context;
 
 	context.pstate = pstate;
+	context.qry = qry;
 	context.groupClauses = groupClauses;
 	context.have_non_var_grouping = have_non_var_grouping;
 	context.sublevels_up = 0;
@@ -617,6 +628,19 @@ check_ungrouped_columns_walker(Node *node,
 					gvar->varlevelsup == 0)
 					return false;		/* acceptable, we're okay */
 			}
+
+			/* Check whether primary key of var's table is subset of group clauses. */
+			rte = rt_fetch(var->varno, context->pstate->p_rtable);
+			if (rte->rtekind == RTE_RELATION)
+			{
+				Oid			constraint_oid;
+
+				if (funcdeps_check_pk(context->groupClauses, rte->relid, var->varno, &constraint_oid))
+				{
+					context->qry->constraintDeps = list_append_unique_oid(context->qry->constraintDeps, constraint_oid);
+					return false;
+				}
+			}
 		}
 
 		/* Found an ungrouped local variable; generate error message */
@@ -656,6 +680,134 @@ check_ungrouped_columns_walker(Node *node,
 }
 
 /*
+ * Search pg_constraint for the constraint associated with the given
+ * index. To make this not too painfully slow, we use the index on
+ * conrelid, which holds the parent relation's OID, so pass that in as
+ * well.
+ */
+static Oid
+get_constraint_oid_for_index(Oid index_oid, Oid rel_oid)
+{
+	ScanKeyData	skey[1];
+	Relation	conrel;
+	SysScanDesc	conscan;
+	HeapTuple	htup;
+	Oid			result;
+
+	ScanKeyInit(&skey[0],
+				Anum_pg_constraint_conrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(rel_oid));
+
+	conrel = heap_open(ConstraintRelationId, AccessShareLock);
+	conscan = systable_beginscan(conrel, ConstraintRelidIndexId, true,
+								 SnapshotNow, 1, skey);
+
+	result = InvalidOid;
+
+	while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+	{
+		Form_pg_constraint conform = (Form_pg_constraint) GETSTRUCT(htup);
+
+		if (conform->contype == CONSTRAINT_PRIMARY &&
+			conform->conindid == index_oid)
+		{
+			result = HeapTupleGetOid(htup);
+			break;
+		}
+	}
+	systable_endscan(conscan);
+	heap_close(conrel, AccessShareLock);
+
+	if (!result)
+		elog(ERROR, "primary key constraint record missing for relation %u",
+			 rel_oid);
+
+	return result;
+}
+
+/*
+ * Check whether the attributes of the primary key or a not-null
+ * unique constraint of relid with range table index rteno appear as a
+ * subset of the group_clauses.  (If so, a functional dependency
+ * exists between the group clauses and any attribute of the relation,
+ * and so attributes of the relation can appear ungrouped.)
+ */
+static bool
+funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *constraint_oid)
+{
+	Relation	rel;
+	ListCell   *indexoidcell;
+
+	rel = heap_open(relid, AccessShareLock);
+
+	foreach(indexoidcell, RelationGetIndexList(rel))
+	{
+		Oid         indexoid = lfirst_oid(indexoidcell);
+		HeapTuple   indexTuple;
+		Form_pg_index indexStruct;
+		int         i;
+		bool		found_col;
+		bool		found_all_cols;
+
+		indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexoid));
+		if (!HeapTupleIsValid(indexTuple))
+			elog(ERROR, "cache lookup failed for index %u", indexoid);
+		indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
+
+		/* TODO: also allow indisunique if we have a way to record
+		 * dependencies on not-null constraints */
+		if (!indexStruct->indisprimary
+			|| !indexStruct->indimmediate
+			|| !indexStruct->indisvalid
+			|| !indexStruct->indisready)
+			continue;
+
+		/*
+		 * Check that the group columns are a superset of the
+		 * primary key columns.
+		 */
+		for (i = 0; i < indexStruct->indnatts; i++)
+		{
+			int2			attnum;
+			ListCell	   *gl;
+
+			attnum = indexStruct->indkey.values[i];
+			found_col = false;
+
+			foreach(gl, group_clauses)
+			{
+				Var		   *gvar = (Var *) lfirst(gl);
+
+				if (IsA(gvar, Var) &&
+					gvar->varno == rteno &&
+					gvar->varattno == attnum &&
+					gvar->varlevelsup == 0)
+				{
+					found_col = true;
+					break;
+				}
+			}
+			if (!found_col)
+				break;
+		}
+		found_all_cols = (i == indexStruct->indnatts && found_col);
+
+		ReleaseSysCache(indexTuple);
+		if (found_all_cols)
+		{
+			heap_close(rel, NoLock);
+			*constraint_oid = get_constraint_oid_for_index(indexoid, relid);
+			return true;
+		}
+	}
+
+	heap_close(rel, NoLock);
+
+	return false;
+}
+
+/*
  * Create expression trees for the transition and final functions
  * of an aggregate.  These are needed so that polymorphic functions
  * can be used within an aggregate --- without the expression trees,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fec8d3c..254062b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -146,6 +146,9 @@ typedef struct Query
 
 	Node	   *setOperations;	/* set-operation tree if this is top level of
 								 * a UNION/INTERSECT/EXCEPT query */
+	List	   *constraintDeps;	/* list of constraint OIDs that are
+								 * required for the query to be
+								 * valid */
 } Query;
 
 
diff --git a/src/test/regress/expected/functional_deps.out b/src/test/regress/expected/functional_deps.out
new file mode 100644
index 0000000..64f29e3
--- /dev/null
+++ b/src/test/regress/expected/functional_deps.out
@@ -0,0 +1,228 @@
+-- from http://www.depesz.com/index.php/2010/04/19/getting-unique-elements/
+CREATE TABLE articles (
+    id int CONSTRAINT articles_pkey PRIMARY KEY,
+    keywords text,
+    title text UNIQUE NOT NULL,
+    body text UNIQUE,
+    created date
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "articles_pkey" for table "articles"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "articles_title_key" for table "articles"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "articles_body_key" for table "articles"
+CREATE TABLE articles_in_category (
+    article_id int,
+    category_id int,
+    changed date,
+    PRIMARY KEY (article_id, category_id)
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "articles_in_category_pkey" for table "articles_in_category"
+-- test functional dependencies based on primary keys/unique constraints
+-- base tables
+-- group by primary key (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by unique not null (fail/todo)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY title;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT id, keywords, title, body, created
+               ^
+-- group by unique nullable (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT id, keywords, title, body, created
+               ^
+-- group by something else (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY keywords;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT id, keywords, title, body, created
+               ^
+-- multiple tables
+-- group by primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+ERROR:  column "a.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.id, a.keywords, a.title, a.body, a.created
+               ^
+-- JOIN syntax
+-- group by left table's primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+ id | keywords | title | body | created 
+----+----------+-------+------+---------
+(0 rows)
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+ERROR:  column "a.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT a.id, a.keywords, a.title, a.body, a.created
+               ^
+-- group by right table's (composite) primary key (OK)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.category_id, aic.article_id;
+ changed 
+---------
+(0 rows)
+
+-- group by right table's partial primary key (fail)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id;
+ERROR:  column "aic.changed" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT aic.changed
+               ^
+-- example from documentation
+CREATE TABLE products (product_id int, name text, price numeric);
+CREATE TABLE sales (product_id int, units int);
+-- OK
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id, p.name, p.price;
+ product_id | name | sales 
+------------+------+-------
+(0 rows)
+
+-- fail
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+ERROR:  column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 1: SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+                           ^
+ALTER TABLE products ADD PRIMARY KEY (product_id);
+NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "products_pkey" for table "products"
+-- OK now
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+ product_id | name | sales 
+------------+------+-------
+(0 rows)
+
+-- Drupal example, http://drupal.org/node/555530
+CREATE TABLE node (
+    nid SERIAL,
+    vid integer NOT NULL default '0',
+    type varchar(32) NOT NULL default '',
+    title varchar(128) NOT NULL default '',
+    uid integer NOT NULL default '0',
+    status integer NOT NULL default '1',
+    created integer NOT NULL default '0',
+    -- snip
+    PRIMARY KEY (nid, vid)
+);
+NOTICE:  CREATE TABLE will create implicit sequence "node_nid_seq" for serial column "node.nid"
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "node_pkey" for table "node"
+CREATE TABLE users (
+    uid integer NOT NULL default '0',
+    name varchar(60) NOT NULL default '',
+    pass varchar(32) NOT NULL default '',
+    -- snip
+    PRIMARY KEY (uid),
+    UNIQUE (name)
+);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "users_pkey" for table "users"
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "users_name_key" for table "users"
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid, u.name;
+ uid | name 
+-----+------
+(0 rows)
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid;
+ uid | name 
+-----+------
+(0 rows)
+
+-- views and dependencies
+-- fail
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+ERROR:  column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
+LINE 2: SELECT id, keywords, title, body, created
+               ^
+-- OK
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+-- fail
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
+ERROR:  cannot drop constraint articles_pkey on table articles because other objects depend on it
+DETAIL:  view fdv1 depends on constraint articles_pkey on table articles
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+DROP VIEW fdv1;
+-- multiple dependencies
+CREATE VIEW fdv2 AS
+SELECT a.id, a.keywords, a.title, aic.category_id, aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id, aic.category_id, aic.article_id;
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT; -- fail
+ERROR:  cannot drop constraint articles_pkey on table articles because other objects depend on it
+DETAIL:  view fdv2 depends on constraint articles_pkey on table articles
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+ALTER TABLE articles_in_category DROP CONSTRAINT articles_in_category_pkey RESTRICT; --fail
+ERROR:  cannot drop constraint articles_in_category_pkey on table articles_in_category because other objects depend on it
+DETAIL:  view fdv2 depends on constraint articles_in_category_pkey on table articles_in_category
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+DROP VIEW fdv2;
+-- nested queries
+CREATE VIEW fdv3 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id
+UNION
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT; -- fail
+ERROR:  cannot drop constraint articles_pkey on table articles because other objects depend on it
+DETAIL:  view fdv3 depends on constraint articles_pkey on table articles
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+DROP VIEW fdv3;
+CREATE VIEW fdv4 AS
+SELECT * FROM articles WHERE title IN (SELECT title FROM articles GROUP BY id);
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT; -- fail
+ERROR:  cannot drop constraint articles_pkey on table articles because other objects depend on it
+DETAIL:  view fdv4 depends on constraint articles_pkey on table articles
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+DROP VIEW fdv4;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 7529777..191d1fe 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -84,7 +84,7 @@ test: rules
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap
+test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 5f185f9..e38d5f0 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -76,6 +76,7 @@ test: union
 test: case
 test: join
 test: aggregates
+test: functional_deps
 test: transactions
 ignore: random
 test: random
diff --git a/src/test/regress/sql/functional_deps.sql b/src/test/regress/sql/functional_deps.sql
new file mode 100644
index 0000000..b090a37
--- /dev/null
+++ b/src/test/regress/sql/functional_deps.sql
@@ -0,0 +1,196 @@
+-- from http://www.depesz.com/index.php/2010/04/19/getting-unique-elements/
+
+CREATE TABLE articles (
+    id int CONSTRAINT articles_pkey PRIMARY KEY,
+    keywords text,
+    title text UNIQUE NOT NULL,
+    body text UNIQUE,
+    created date
+);
+
+CREATE TABLE articles_in_category (
+    article_id int,
+    category_id int,
+    changed date,
+    PRIMARY KEY (article_id, category_id)
+);
+
+-- test functional dependencies based on primary keys/unique constraints
+
+-- base tables
+
+-- group by primary key (OK)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+
+-- group by unique not null (fail/todo)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY title;
+
+-- group by unique nullable (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+
+-- group by something else (fail)
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY keywords;
+
+-- multiple tables
+
+-- group by primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a, articles_in_category AS aic
+WHERE a.id = aic.article_id AND aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+
+-- JOIN syntax
+
+-- group by left table's primary key (OK)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id;
+
+-- group by something else (fail)
+SELECT a.id, a.keywords, a.title, a.body, a.created
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id, aic.category_id;
+
+-- group by right table's (composite) primary key (OK)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.category_id, aic.article_id;
+
+-- group by right table's partial primary key (fail)
+SELECT aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY aic.article_id;
+
+
+-- example from documentation
+
+CREATE TABLE products (product_id int, name text, price numeric);
+CREATE TABLE sales (product_id int, units int);
+
+-- OK
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id, p.name, p.price;
+
+-- fail
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+
+ALTER TABLE products ADD PRIMARY KEY (product_id);
+
+-- OK now
+SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
+    FROM products p LEFT JOIN sales s USING (product_id)
+    GROUP BY product_id;
+
+
+-- Drupal example, http://drupal.org/node/555530
+
+CREATE TABLE node (
+    nid SERIAL,
+    vid integer NOT NULL default '0',
+    type varchar(32) NOT NULL default '',
+    title varchar(128) NOT NULL default '',
+    uid integer NOT NULL default '0',
+    status integer NOT NULL default '1',
+    created integer NOT NULL default '0',
+    -- snip
+    PRIMARY KEY (nid, vid)
+);
+
+CREATE TABLE users (
+    uid integer NOT NULL default '0',
+    name varchar(60) NOT NULL default '',
+    pass varchar(32) NOT NULL default '',
+    -- snip
+    PRIMARY KEY (uid),
+    UNIQUE (name)
+);
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid, u.name;
+
+-- OK
+SELECT u.uid, u.name FROM node n
+INNER JOIN users u ON u.uid = n.uid
+WHERE n.type = 'blog' AND n.status = 1
+GROUP BY u.uid;
+
+
+-- views and dependencies
+
+-- fail
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY body;
+
+-- OK
+CREATE VIEW fdv1 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+
+-- fail
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
+
+DROP VIEW fdv1;
+
+
+-- multiple dependencies
+CREATE VIEW fdv2 AS
+SELECT a.id, a.keywords, a.title, aic.category_id, aic.changed
+FROM articles AS a JOIN articles_in_category AS aic ON a.id = aic.article_id
+WHERE aic.category_id in (14,62,70,53,138)
+GROUP BY a.id, aic.category_id, aic.article_id;
+
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT; -- fail
+ALTER TABLE articles_in_category DROP CONSTRAINT articles_in_category_pkey RESTRICT; --fail
+
+DROP VIEW fdv2;
+
+
+-- nested queries
+
+CREATE VIEW fdv3 AS
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id
+UNION
+SELECT id, keywords, title, body, created
+FROM articles
+GROUP BY id;
+
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT; -- fail
+
+DROP VIEW fdv3;
+
+
+CREATE VIEW fdv4 AS
+SELECT * FROM articles WHERE title IN (SELECT title FROM articles GROUP BY id);
+
+ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT; -- fail
+
+DROP VIEW fdv4;

#33

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Peter Eisentraut (#32)

Re: Functional dependencies and GROUP BY

Peter Eisentraut <peter_e@gmx.net> writes:

Next version. Changed dependencies to pg_constraint, removed handling
of unique constraints for now, and made some enhancements so that views
track dependencies on constraints even in subqueries. Should be close
to final now. :-)

I've committed this with some revisions, notably:

The view.c changes were fundamentally wrong. The right place to
extract dependencies from a parsetree is in dependency.c, specifically
find_expr_references_walker. The way you were doing it meant that
dependencies on constraints would only be noticed for views, not for
other cases such as stored plans.

I rewrote the catalog search to look only at pg_constraint, not using
pg_index at all. I think this will be easier to extend to the case of
looking for UNIQUE + NOT NULL, whenever we get around to doing that.
I also moved the search into catalog/pg_constraint.c, because it didn't
seem to belong in parse_agg (as hinted by the large number of #include
additions you had to make to put it there).

I put in a bit of caching logic to prevent repeating the search for
multiple Vars of the same relation. Tests or no tests, I can't believe
that's going to be cheap enough that we want to repeat it over and
over...

regards, tom lane

#34

Dean Rasheed

dean.a.rasheed@gmail.com

over 15 years ago

In reply to: Tom Lane (#33)

Re: Functional dependencies and GROUP BY

On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Next version. Changed dependencies to pg_constraint, removed handling
of unique constraints for now, and made some enhancements so that views
track dependencies on constraints even in subqueries. Should be close
to final now. :-)

I've committed this with some revisions, notably:

The view.c changes were fundamentally wrong. The right place to
extract dependencies from a parsetree is in dependency.c, specifically
find_expr_references_walker. The way you were doing it meant that
dependencies on constraints would only be noticed for views, not for
other cases such as stored plans.

I rewrote the catalog search to look only at pg_constraint, not using
pg_index at all. I think this will be easier to extend to the case of
looking for UNIQUE + NOT NULL, whenever we get around to doing that.
I also moved the search into catalog/pg_constraint.c, because it didn't
seem to belong in parse_agg (as hinted by the large number of #include
additions you had to make to put it there).

I put in a bit of caching logic to prevent repeating the search for
multiple Vars of the same relation. Tests or no tests, I can't believe
that's going to be cheap enough that we want to repeat it over and
over...

regards, tom lane

I was testing out this feature this morning and discovered that the
results may be non-deterministic if the PK is deferrable. I think that
check_functional_grouping() should exclude any deferrable constraints,
since in general, any inference based on a deferrable constraint can't
be trusted when planning a query, since the constraint can't be
guaranteed to be valid when the query is executed. That's also
consistent with the SQL spec.

The original version of the patch had that check in it, but it
vanished from the final committed version. Was that just an oversight,
or an intentional change?

Regards,
Dean

#35

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Dean Rasheed (#34)

Re: Functional dependencies and GROUP BY

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I was testing out this feature this morning and discovered that the
results may be non-deterministic if the PK is deferrable.

Good point.

The original version of the patch had that check in it, but it
vanished from the final committed version. Was that just an oversight,
or an intentional change?

I don't recall having thought about it one way or the other. What did
the check look like?

regards, tom lane

#36

Dean Rasheed

dean.a.rasheed@gmail.com

over 15 years ago

In reply to: Tom Lane (#35)

Re: Functional dependencies and GROUP BY

On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I was testing out this feature this morning and discovered that the
results may be non-deterministic if the PK is deferrable.

Good point.

The original version of the patch had that check in it, but it
vanished from the final committed version. Was that just an oversight,
or an intentional change?

I don't recall having thought about it one way or the other. What did
the check look like?

Well originally it was searching indexes rather than constraints, and
funcdeps_check_pk() included the following check:

if (!indexStruct->indisprimary || !indexStruct->indimmediate)
continue;

Now its looping over pg_constraint entries, so I guess anything wtih
con->condeferrable == true should be ignored.

Regards,
Dean

#37

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Dean Rasheed (#36)

Re: Functional dependencies and GROUP BY

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't recall having thought about it one way or the other. �What did
the check look like?

Well originally it was searching indexes rather than constraints, and
funcdeps_check_pk() included the following check:

if (!indexStruct->indisprimary || !indexStruct->indimmediate)
continue;

Now its looping over pg_constraint entries, so I guess anything wtih
con->condeferrable == true should be ignored.

Seems reasonable, will fix. Thanks for the report!

regards, tom lane

#38

Peter Eisentraut

peter_e@gmx.net

over 15 years ago

In reply to: Tom Lane (#37)

Re: Functional dependencies and GROUP BY

On sön, 2010-09-05 at 11:35 -0400, Tom Lane wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't recall having thought about it one way or the other. What did
the check look like?

Well originally it was searching indexes rather than constraints, and
funcdeps_check_pk() included the following check:

if (!indexStruct->indisprimary || !indexStruct->indimmediate)
continue;

Now its looping over pg_constraint entries, so I guess anything wtih
con->condeferrable == true should be ignored.

Seems reasonable, will fix. Thanks for the report!

Yes, the SQL standard explicitly requires the constraint in question to
be immediate.