A GROUP BY question
I am trying to write, what is for me, a fairly complex query. It uses JOINS,
and also GROUP BY. I have this working with the exception of adding the
GROUP BY clause.
Is there some reason I cannot add a GROUP BY function to a JOIN?
Here is what I have:
CREATE OR REPLACE view tasks_view as
select
project.proj_no ,
employee.first_name ,
employee.last_name ,
employee.id ,
task_instance.hours ,
work_type.type,
work_type.descrip,
rate.rate,
employee.hourly_rate ,
rate.rate * task_instance.hours as result ,
SUM(rate.rate * task_instance.hours)
^^^^^^^^^^^^^^
from
task_instance
GROUP BY
^^^^^^^^^^^^^^^^^^
project.project_key
^^^^^^^^^^^^^^^^^^^^^^
join rate on
rate.employee_key = task_instance.employee_key
AND
rate.work_type_key = task_instance.work_type_key
inner join employee on
rate.employee_key = employee.employee_key
inner join work_type on
rate.work_type_key = work_type.work_type_key
inner join project on
project.project_key = task_instance.project_key
ORDER BY
project.proj_no ,
employee.id
;
The underlined lines are what I added, and I get a syntax error pointing to
the join. This works fine without the added lines.
--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin
On Tue, Aug 13, 2019 at 05:54:03AM -0400, stan wrote:
I am trying to write, what is for me, a fairly complex query. It uses JOINS,
and also GROUP BY. I have this working with the exception of adding the
GROUP BY clause.Is there some reason I cannot add a GROUP BY function to a JOIN?
Here is what I have:
CREATE OR REPLACE view tasks_view as
select
project.proj_no ,
employee.first_name ,
employee.last_name ,
employee.id ,
task_instance.hours ,
work_type.type,
work_type.descrip,
rate.rate,
employee.hourly_rate ,
rate.rate * task_instance.hours as result ,
SUM(rate.rate * task_instance.hours)
^^^^^^^^^^^^^^
from
task_instance
GROUP BY
^^^^^^^^^^^^^^^^^^
project.project_key
^^^^^^^^^^^^^^^^^^^^^^
join rate on
rate.employee_key = task_instance.employee_key
AND
rate.work_type_key = task_instance.work_type_key
inner join employee on
rate.employee_key = employee.employee_key
inner join work_type on
rate.work_type_key = work_type.work_type_key
inner join project on
project.project_key = task_instance.project_key
ORDER BY
project.proj_no ,
employee.id
;
Maybe I have a basic misunderstanding. What I am trying to get is a total
cost for each project. This would be calculated by multiplying rate and
hours for each row, on a per project base, and then summing all of th
products of this multiplication.
I did get the following to be accepted from a syntax basis, but it returns
rows with the product for each row, and something in the sum column which
is the same.
DROP view tasks_view ;
CREATE OR REPLACE view tasks_view as
select
project.proj_no ,
employee.first_name ,
employee.last_name ,
employee.id ,
task_instance.hours ,
work_type.type,
work_type.descrip,
rate.rate,
employee.hourly_rate ,
rate.rate * task_instance.hours as result ,
SUM (rate.rate * task_instance.hours)
from
task_instance
join rate on
rate.employee_key = task_instance.employee_key
AND
rate.work_type_key = task_instance.work_type_key
inner join employee on
rate.employee_key = employee.employee_key
inner join work_type on
rate.work_type_key = work_type.work_type_key
inner join project on
project.project_key = task_instance.project_key
GROUP BY
project.project_key ,
employee.first_name ,
employee.last_name ,
employee.id ,
task_instance.hours ,
work_type.type,
work_type.descrip,
rate.rate,
employee.hourly_rate
ORDER BY
project.proj_no
;
--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin
On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote:
select
project.proj_no ,
Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way.
SUM (rate.rate * task_instance.hours)
from
task_instance
join rate on
rate.employee_key = task_instance.employee_key
AND
rate.work_type_key = task_instance.work_type_key
(break)
inner join employee on
rate.employee_key = employee.employee_key
inner join work_type on
rate.work_type_key = work_type.work_type_key
These are now probably redundant, you don’t need them unless they filter your results.
inner join project on
project.project_key = task_instance.project_key
And this JOIN could be dropped if project_key and proj_no weren’t different fields. If both are unique in project, you could drop one of them and keep the same functionality with fewer joins. That said, in the “war” between surrogate and natural keys I’m on the natural keys side. Clearly, not everyone agrees on that.
GROUP BY
project.project_key ,
Same columns removed here too.
ORDER BY
project.proj_no
;
That should give you the total cost for each project.
You could get the same result repeated per employee and per work type as you tried originally, by putting the above revised query as a subquery and joining that back into the full query in the place of your project-related tables (add the project_key so you have something to join against).
The repeated sum risks getting multiplied in the final output though, especially if unaware people will be putting the results in an Excel sheet or something. From experience, that either results in people reporting the wrong financial results (several orders too high) or blaming your query.
Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Hi Stan,
Am Dienstag, 13. August 2019, 13:10:18 CEST schrieb stan:
Maybe I have a basic misunderstanding. What I am trying to get is a total
cost for each project. This would be calculated by multiplying rate and
hours for each row, on a per project base, and then summing all of th
products of this multiplication.
First of all, complex views including many joins, group by, having etc usually tend to
show bad performance on large dataset if used for anything else than simple select from
... statement. So at least my personal experience.
CREATE OR REPLACE view tasks_view as
select
project.proj_no ,
employee.first_name ,
employee.last_name ,
employee.id ,
task_instance.hours ,
work_type.type,
work_type.descrip,
rate.rate,
employee.hourly_rate ,
rate.rate * task_instance.hours as result ,
--------------------^
this gives you the product *per line*
SUM (rate.rate * task_instance.hours)
--------------------^
this gives you *sum of all products over all grouped lines*
from
task_instance
join rate on
rate.employee_key = task_instance.employee_key
AND
rate.work_type_key = task_instance.work_type_key
inner join employee on
rate.employee_key = employee.employee_key
inner join work_type on
rate.work_type_key = work_type.work_type_key
inner join project on
project.project_key = task_instance.project_key
GROUP BY
project.project_key ,
employee.first_name ,
employee.last_name ,
employee.id ,
task_instance.hours ,
work_type.type,
work_type.descrip,
rate.rate,
employee.hourly_rate
---------------------^
You group by the columns you use in the sum, so you will get no sum at all, but the
product *per line* as selected just before the sum
ORDER BY
project.proj_no
;
You will have to find out if you really need to group by some lines, and take a sum over
those lines or need the product (rate.rate * task_instance.hours) per line.
--
Best regards
Jan
On Tuesday, August 13, 2019, Alban Hertroys <haramrae@gmail.com> wrote:
On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote:
select
project.proj_no ,Removed columns that get in the way of your desired result. You can’t have
both details and the sum over them in a meaningful way.
Sure you can, at least generally, with Window Functions/Expressions (i.e.,
OVER)
David J.
On 13 Aug 2019, at 15:19, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, August 13, 2019, Alban Hertroys <haramrae@gmail.com> wrote:
On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote:
select
project.proj_no ,Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way.
Sure you can, at least generally, with Window Functions/Expressions (i.e., OVER)
That’s why I added “in a meaningful way” ;)
Repeating the same SUM-result on every line in a group is not what I’d call a meaningful result; the SUM has no bearing on the detailed line and leads to the kind of mistakes I already mentioned.
(For the record; I do this kind of grouping in a hierarchical database regularly, but there the grouped SUM is at a different level in the hierarchy and I consider it thus sufficiently separated from the detail rows.)
Besides, I figured the OP was already struggling with the query syntax, adding window functions into the mix didn’t seem a good idea in the context. Possible?, sure, desirable?, I would say not.
Alban Hertroys
--
There is always an exception to always.