Window function?

Started by Robert Stanfordalmost 4 years ago6 messagesgeneral
Jump to latest
#1Robert Stanford
rstanford@gmail.com

Hi,

I have time series data from multiple inputs with start and
end timestamps.

Can anyone suggest an elegant way to coalesce consecutive rows so only the
first start time and last end time for each group of events (by input) is
returned.

Sample from and to below where the rows for Input number 4 could be
massaged.
(Timestamps selected with timestamp(0) for convenience)

From this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:08:50 2022-06-04 09:09:00
4 2022-06-04 09:08:10 2022-06-04 09:08:50
4 2022-06-04 09:07:47 2022-06-04 09:08:10
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:06:37 2022-06-04 09:06:47
4 2022-06-04 09:06:29 2022-06-04 09:06:37
4 2022-06-04 09:06:17 2022-06-04 09:06:29
4 2022-06-04 09:05:53 2022-06-04 09:06:17
16 2022-06-04 09:04:33 2022-06-04 09:05:53

To this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:07:47 2022-06-04 09:09:00
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:05:53 2022-06-04 09:06:47
16 2022-06-04 09:04:33 2022-06-04 09:05:53

Thanks in advance to anyone who can help!
Robert

#2Thiemo Kellner
thiemo@gelassene-pferde.biz
In reply to: Robert Stanford (#1)
Re: Window function?

Hi Robert

Interesting problem. I need to think about it.

You need to figure out when Input changes. You can achieve this by using
lead or lag (depending of the sort direction over start)
https://www.postgresql.org/docs/current/functions-window.html .

Hope this nudges you to a solution.

Kind regards

Thiemo

Am 04.06.22 um 10:18 schrieb Robert Stanford:

Show quoted text

Hi,

I have time series data from multiple inputs with start and
end timestamps.

Can anyone suggest an elegant way to coalesce consecutive rows so only
the
first start time and last end time for each group of events (by input)
is returned.

Sample from and to below where the rows for Input number 4 could be
massaged.
(Timestamps selected with timestamp(0) for convenience)

From this:
Input   Start                 End
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:08:50   2022-06-04 09:09:00
4       2022-06-04 09:08:10   2022-06-04 09:08:50
4       2022-06-04 09:07:47   2022-06-04 09:08:10
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:06:37   2022-06-04 09:06:47
4       2022-06-04 09:06:29   2022-06-04 09:06:37
4       2022-06-04 09:06:17   2022-06-04 09:06:29
4       2022-06-04 09:05:53   2022-06-04 09:06:17
16      2022-06-04 09:04:33   2022-06-04 09:05:53

To this:
Input   Start                 End
5       2022-06-04 09:09:00   2022-06-04 09:09:29
4       2022-06-04 09:07:47   2022-06-04 09:09:00
17      2022-06-04 09:06:47   2022-06-04 09:07:47
4       2022-06-04 09:05:53   2022-06-04 09:06:47
16      2022-06-04 09:04:33   2022-06-04 09:05:53

Thanks in advance to anyone who can help!
Robert

#3Robert Stanford
rstanford@gmail.com
In reply to: Thiemo Kellner (#2)
Re: Window function?

On Sat, 4 Jun 2022 at 22:56, Thiemo Kellner <thiemo@gelassene-pferde.biz>
wrote:

Hi Robert

Interesting problem. I need to think about it.

You need to figure out when Input changes. You can achieve this by using
lead or lag (depending of the sort direction over start)
https://www.postgresql.org/docs/current/functions-window.html .

After a bit of poking around this seems to get things going in the right
direction.

select row_number() over (order by start) - row_number() over (partition
by input order by start)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Stanford (#1)
Re: Window function?

Robert Stanford <rstanford@gmail.com> writes:

Can anyone suggest an elegant way to coalesce consecutive rows so only the
first start time and last end time for each group of events (by input) is
returned.

Window functions don't change the number of query result rows, so that
idea is a dead end. The standard way to solve this is GROUP BY:

SELECT Input, MIN(Start) AS Start, MAX(End) AS End
FROM ...
GROUP BY Input
ORDER BY Input -- optional

If MIN/MAX don't quite express what you need to happen, then you
might need to write custom aggregates. Point though is that
you want aggregation within GROUP BY groups, not windowing.

regards, tom lane

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#4)
Re: Window function?

On Sat, Jun 4, 2022 at 7:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Stanford <rstanford@gmail.com> writes:

Can anyone suggest an elegant way to coalesce consecutive rows so only

the

first start time and last end time for each group of events (by input) is
returned.

Window functions don't change the number of query result rows, so that
idea is a dead end. The standard way to solve this is GROUP BY:

SELECT Input, MIN(Start) AS Start, MAX(End) AS End
FROM ...
GROUP BY Input
ORDER BY Input -- optional

Assuming that input group 4 really is supposed to show up twice in the
output then the final solution is going to have to be a combination of this
and window functions. You need the later in order to be able to say "input
4, appearance 1" and "input 4, appearance 2" - which lead/lag tends to help
with. Then the group by becomes {input, appearance}.

David J.

#6John W Higgins
wishdev@gmail.com
In reply to: Robert Stanford (#1)
Re: Window function?

On Sat, Jun 4, 2022 at 1:18 AM Robert Stanford <rstanford@gmail.com> wrote:

Hi,

From this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:08:50 2022-06-04 09:09:00
4 2022-06-04 09:08:10 2022-06-04 09:08:50
4 2022-06-04 09:07:47 2022-06-04 09:08:10
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:06:37 2022-06-04 09:06:47
4 2022-06-04 09:06:29 2022-06-04 09:06:37
4 2022-06-04 09:06:17 2022-06-04 09:06:29
4 2022-06-04 09:05:53 2022-06-04 09:06:17
16 2022-06-04 09:04:33 2022-06-04 09:05:53

To this:
Input Start End
5 2022-06-04 09:09:00 2022-06-04 09:09:29
4 2022-06-04 09:07:47 2022-06-04 09:09:00
17 2022-06-04 09:06:47 2022-06-04 09:07:47
4 2022-06-04 09:05:53 2022-06-04 09:06:47
16 2022-06-04 09:04:33 2022-06-04 09:05:53

lag is indeed your friend here - assuming times is your table name

with times_cte as (select *, lag(input, 1) over () from times)
select input, start, end from times_cte where input != coalesce(lag, -1);

The coalesce to -1 is needed at the end to get the first row which has null
for the lag value because it's the first row.

John