UNION + GROUP BY bug located
I noticed a couple of days ago that the current sources coredump if you
try to use GROUP BY on the first sub-SELECT of a UNION, eg
create table category (name text, image text, url text, parent oid);
select name from category group by name
union select image from category;
=> kerboom
(It works if you put a GROUP BY on the second select, though. 6.4.2
didn't coredump in a cursory test, but it didn't produce the right
answers either.)
A check of the mail archives shows that Bill Carlson reported this
bug to pgsql-sql on 22 April, but I'd not picked up on it at the time.
The cause is that plan_union_queries() is failing to clear out the
groupclause before it returns control to union_planner, so the GROUP BY
gets applied twice, once to the subselect and once (incorrectly) to the
result of the UNION. (This wouldn't have happened with a less klugy
representation for UNION parsetrees, but I digress.) You can see this
happening if you look at the EXPLAIN output; the coredump only happens
at execution time.
This patch fixes it:
*** backend/optimizer/prep/prepunion.c.orig Sun Jun 6 13:38:11 1999
--- backend/optimizer/prep/prepunion.c Wed Jun 9 20:38:48 1999
***************
*** 192,197 ****
--- 192,204 ----
/* needed so we don't take the flag from the first query */
parse->uniqueFlag = NULL;
+ /* Make sure we don't try to apply the first query's grouping stuff
+ * to the Append node, either. Basically we don't want union_planner
+ * to do anything when we return control, except add the top sort/unique
+ * nodes for DISTINCT processing if this wasn't UNION ALL, or the top
+ * sort node if it was UNION ALL with a user-provided sort clause.
+ */
+ parse->groupClause = NULL;
parse->havingQual = NULL;
parse->hasAggs = false;
I feel fairly confident that this is a low-risk patch; certainly
it cannot break anything that doesn't involve GROUP BY and UNION.
Is there any objection to my committing it at this late hour?
regards, tom lane
The cause is that plan_union_queries() is failing to clear out the
groupclause before it returns control to union_planner, so the GROUP BY
gets applied twice, once to the subselect and once (incorrectly) to the
result of the UNION. (This wouldn't have happened with a less klugy
representation for UNION parsetrees, but I digress.) You can see this
happening if you look at the EXPLAIN output; the coredump only happens
at execution time.
Is going to be similar to LIKE indexing, where everyone beats me up
about it, but the code remains unchanged because no one can think of a
better/cleaner idea? But i digress... :-)
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026