FIRST_VALUE: need to group by argument?

Started by Guyren Howealmost 10 years ago4 messagesgeneral

guyren@gmail.com

almost 10 years ago

I am missing something here.

I have two tables:

orders
id

delivery_route_segments
id,
order_id,
position,
completed

I want to find the first uncompleted deliver_route_segment for each order, by position. Seems to me I ought to be able to do this:

SELECT
o.id,
FIRST_VALUE(drs.id)
FROM
orders o JOIN
delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
o.id
but I'm told I need an over clause.

So I try this:
SELECT
o.id,
FIRST_VALUE(drs.id) OVER (ORDER BY position ASC)
FROM
orders o JOIN
delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
o.id
here I'm told "drs.id must appear in the GROUP BY clause". This doesn't make sense to me; I shouldn't need to group by a value that's inside an aggregate function.

Tried this.
SELECT
o.id,
FIRST_VALUE(drs.id) OVER (PARTITION BY o.id ORDER BY position ASC)
FROM
orders o JOIN
delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
o.id
but it has the same problem.

I can solve this with a subquery, but:
- I'd still like to know what's wrong; and
- I expect the subquery to be slower (yes?)

David G. Johnston

david.g.johnston@gmail.com

almost 10 years ago

In reply to: Guyren Howe (#1)

Re: FIRST_VALUE: need to group by argument?

On Mon, May 23, 2016 at 11:48 PM, Guyren Howe <guyren@gmail.com> wrote:

I am missing something here.

I have two tables:

orders
id

delivery_route_segments
id,
order_id,
position,
completed

I want to find the first uncompleted deliver_route_segment for each order,
by position. Seems to me I ought to be able to do this:

SELECT
o.id,
FIRST_VALUE(drs.id)
FROM
orders o JOIN
delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
o.id

but I'm told I need an over clause.

So I try this:

SELECT
o.id,
FIRST_VALUE(drs.id) OVER (ORDER BY position ASC)
FROM
orders o JOIN
delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
o.id

here I'm told "drs.id must appear in the GROUP BY clause". This doesn't
make sense to me; I shouldn't need to group by a value that's inside an
aggregate function.

Tried this.

SELECT
o.id,
FIRST_VALUE(drs.id) OVER (PARTITION BY o.id ORDER BY position ASC)
FROM
orders o JOIN
delivery_route_segments drs ON (drs.order_id = o.id AND NOT drs.completed)
GROUP BY
o.id

but it has the same problem.

I can solve this with a subquery, but:

- I'd still like to know what's wrong; and

- I expect the subquery to be slower (yes?)

Window functions don't provoke grouping. That it their blessing. The

docs on them are pretty good. first_value is strictly a window function (I
may have missed this point when you first asked the question a few days
ago).

SELECT i, sum(i) OVER () FROM ( VALUES (a,1), (a,2), (b,3) ) val (x,i)
yields
1, 6
2, 6
3, 6

You can see how all three rows are still present in the output - thus no
grouping.

SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM ( VALUES (a,1),
(a,2), (b,3) ) val (x,i) GROUP BY x
yields
a, 3, 6
b, 3, 6

Here you have to write sum(sum(i)) in the window expression since i is not
in the group by and must first be aggregated somehow to satisfy the group
by and then the corresponding window evaluation sums those aggregated i's.

The query you are looking for doesn't involve aggregate functions at all.

SELECT DISTINCT ON (order_id) order_id, drs.id
FROM delivery_route_segments drs
WHERE NOT drs.completed
ORDER BY order_id, position ASC

A semantically equivalent query using a window function would be.

WITH ordered_routes AS (
SELECT drs.order_id, drs.id AS drsid, row_number() OVER (PARTITION BY
order_id ORDER BY position ASC) AS order_row
FROM delivery_route_segments drs
)
SELECT order_id, drsid
FROM ordered_routes
WHERE order_row = 1;

As I mentioned in the other thread if you are using group by to simply
remove duplicates introduced by a join, like you are here, you are probably
doing something wrong.

David J.

targen@gmail.com

almost 10 years ago

In reply to: David G. Johnston (#2)

Re: FIRST_VALUE: need to group by argument?

On Tue, May 24, 2016 at 8:50 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM ( VALUES (a,1),
(a,2), (b,3) ) val (x,i) GROUP BY x
yields
a, 3, 6
b, 3, 6

Thank you for this enlightening explanation! I was, however, very
confused from this specific bit, so I tested it and indeed there must
have been a typo:

postgres=# SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM
(VALUES ('a', 1), ('a', 2), ('b', 3)) val (x,i) GROUP BY x;
x | sum | sum
---+-----+-----
a | 3 | 3
b | 3 | 3
(2 rows)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

almost 10 years ago

In reply to: Manuel Gómez (#3)

Re: FIRST_VALUE: need to group by argument?

On Tue, May 24, 2016 at 12:12 PM, Manuel Gómez <targen@gmail.com> wrote:

On Tue, May 24, 2016 at 8:50 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM ( VALUES (a,1),
(a,2), (b,3) ) val (x,i) GROUP BY x
yields
a, 3, 6
b, 3, 6

Thank you for this enlightening explanation! I was, however, very
confused from this specific bit, so I tested it and indeed there must
have been a typo:

postgres=# SELECT x, sum(i), sum(sum(i)) OVER (PARTITION BY x) FROM
(VALUES ('a', 1), ('a', 2), ('b', 3)) val (x,i) GROUP BY x;
x | sum | sum
---+-----+-----
a | 3 | 3
b | 3 | 3
(2 rows)

Yes, that was me being careless. The partition in the window makes the
sum(sum(i)) evaluate to 3

David J.