Group by range in hour of day

Started by Israel Brewsterabout 11 years ago23 messagesgeneral
Jump to latest
#1Israel Brewster
israel@ravnalaska.net

I have a table with two timestamp columns for the start time and end time of each record (call them start and end).I'm trying to figure out if there is a way to group these records by "hour of day", that is the record should be included in the group if the hour of the day for the group falls anywhere in the range [start,end]. Obviously each record may well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the total number of "active" records for that hour, and b) what is the total "active" time for those records during the hour, with an ultimate goal of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the average, which would of course always be less than or equal to 1. Is this doable in postgress? Or would it be a better idea to simply pull the raw data and post-process in code? Thanks.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

Attachments:

Israel Brewster.vcftext/directory; name="Israel Brewster.vcf"Download
#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Israel Brewster (#1)
Re: Group by range in hour of day

On 03/16/2015 02:57 PM, Israel Brewster wrote:

I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Israel Brewster (#1)
Re: Group by range in hour of day

I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day",

I think you can do this by selecting `FROM generate_series(0, 23) s(h)`
and then joining to your table based on `h BETWEEN start AND end`.

Whenever I need to write a time-series aggregate query I reach for
generate_series. Mostly that's so I have output rows even when COUNT(*)
would be 0, but here it also means that a row from your data can feed
into multiple output rows.

I could probably write this out in more detail if you like, but that's
the short version. :-)

Good luck!

Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Adrian Klaver (#2)
Re: Group by range in hour of day

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 03/16/2015 02:57 PM, Israel Brewster wrote:

I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

​The following will give you endpoints for your bounds. Version is
important since "range types" could be very useful in this situation - but
you'd still need to generate the bounds info regardless.​


SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz,
'2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM
generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1
hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN [...]
OR end BETWEEN [...]) - range logic will be better and you may want to
adjust the upper bound by negative 1 (nano-second?) to allow for easier
"<=" logic if using BETWEEN.

​David J.​

#5Israel Brewster
israel@ravnalaska.net
In reply to: Adrian Klaver (#2)
Re: Group by range in hour of day

On Mar 16, 2015, at 2:13 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 03/16/2015 02:57 PM, Israel Brewster wrote:

I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

Oh, right. Of course. I'm on 9.4.0

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Israel Brewster
israel@ravnalaska.net
In reply to: Paul Jungwirth (#3)
Re: Group by range in hour of day

On Mar 16, 2015, at 2:16 PM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:

I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day",

I think you can do this by selecting `FROM generate_series(0, 23) s(h)` and then joining to your table based on `h BETWEEN start AND end`.

Whenever I need to write a time-series aggregate query I reach for generate_series. Mostly that's so I have output rows even when COUNT(*) would be 0, but here it also means that a row from your data can feed into multiple output rows.

I could probably write this out in more detail if you like, but that's the short version. :-)

I think I can work with that :-) Hadn't considered doing a join there, so that's a new approach I can investigate. Thanks!

Good luck!

Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7John W Higgins
wishdev@gmail.com
In reply to: Israel Brewster (#1)
Re: Group by range in hour of day

Assuming 3 things

Table name - test
Column names - start_time, end_time
Added an id column (int) to distinguish each record in the table

You can go with this..... (my apologies for formatting issues)

with
slots as (
select *
from generate_series(0,1439) as s(slot)
),
slots_hours as (
select slot,
slot / 60 as hour
from slots
),
minutes as (
select id,
date_part('hour', start_time) * 60 + date_part('minute',
start_time) as start_minute,
date_part('hour', end_time) * 60 + date_part('minute',
end_time) as end_minute
from test
),
minute_slots as (
select id,
slot,
hour
from minutes
join slots_hours
on minutes.start_minute <= slots_hours.slot
and minutes.end_minute > slots_hours.slot
)
select hour,
count(*) / 60.0 as sum,
count(distinct id) as count
from minute_slots
group by hour

I'm certain there are more elegant solutions possible - but you can grasp
each step this way.

John

On Mon, Mar 16, 2015 at 2:57 PM, Israel Brewster <israel@ravnalaska.net>
wrote:

Show quoted text

I have a table with two timestamp columns for the start time and end time
of each record (call them start and end).I'm trying to figure out if there
is a way to group these records by "hour of day", that is the record should
be included in the group if the hour of the day for the group falls
anywhere in the range [start,end]. Obviously each record may well fall into
multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal of
figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is this
doable in postgress? Or would it be a better idea to simply pull the raw
data and post-process in code? Thanks.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

#8Israel Brewster
israel@ravnalaska.net
In reply to: David G. Johnston (#4)
Re: Group by range in hour of day

On Mar 16, 2015, at 2:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

​The following will give you endpoints for your bounds. Version is important since "range types" could be very useful in this situation - but you'd still need to generate the bounds info regardless.​

​SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN [...] OR end BETWEEN [...]) - range logic will be better and you may want to adjust the upper bound by negative 1 (nano-second?) to allow for easier "<=" logic if using BETWEEN.

Thanks, that is very helpful, but are you sure CROSS JOIN is what you wanted here? using that, I get a 625 row result set where each row from the first SELECT is paired up with EVERY row from the second select. I would think I would want the first row of the first SELECT paired up with only the first row of the second, second row of the first paired with the second row of the second, etc - i.e. 24 start and end bounds. Or am I missing something?

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

Show quoted text

​David J.​

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Israel Brewster (#8)
Re: Group by range in hour of day

On 03/16/2015 04:16 PM, Israel Brewster wrote:

On Mar 16, 2015, at 2:22 PM, David G. Johnston
<david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>wrote:

On 03/16/2015 02:57 PM, Israel Brewster wrote:

I have a table with two timestamp columns for the start time
and end
time of each record (call them start and end).I'm trying to
figure out
if there is a way to group these records by "hour of day",
that is the
record should be included in the group if the hour of the day
for the
group falls anywhere in the range [start,end]. Obviously each
record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a)
what is the
total number of "active" records for that hour, and b) what is
the total
"active" time for those records during the hour, with an
ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
------------------------------__-----------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate
goal of the
average, which would of course always be less than or equal to
1. Is
this doable in postgress? Or would it be a better idea to
simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

​The following will give you endpoints for your bounds. Version is
important since "range types" could be very useful in this situation -
but you'd still need to generate the bounds info regardless.​


SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz,
'2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM
generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
'1 hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN
[...] OR end BETWEEN [...]) - range logic will be better and you may
want to adjust the upper bound by negative 1 (nano-second?) to allow
for easier "<=" logic if using BETWEEN.

Thanks, that is very helpful, but are you sure CROSS JOIN is what you
wanted here? using that, I get a 625 row result set where each row from
the first SELECT is paired up with EVERY row from the second select. I
would think I would want the first row of the first SELECT paired up
with only the first row of the second, second row of the first paired
with the second row of the second, etc - i.e. 24 start and end bounds.
Or am I missing something?

Given this:

test=> select * from start_end ;
id | start_time | end_time
----+------------------------+------------------------
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07

using Pauls hints I got:

test=> select * from start_end, generate_series(0, 23) as s(h) where h
between extract(hour from start_time) and extract(hour from end_time) ;

id | start_time | end_time | h
----+------------------------+------------------------+----
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11

test=> select h, count(*) from start_end, generate_series(0, 23) as s(h)
where h between extract(hour from start_time) and extract(hour from
end_time) group by h order by h;

h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

​David J.​

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10David G. Johnston
david.g.johnston@gmail.com
In reply to: Israel Brewster (#8)
Re: Group by range in hour of day

On Mon, Mar 16, 2015 at 4:16 PM, Israel Brewster <israel@ravnalaska.net>
wrote:

On Mar 16, 2015, at 2:22 PM, David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 03/16/2015 02:57 PM, Israel Brewster wrote:

I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

​The following will give you endpoints for your bounds. Version is
important since "range types" could be very useful in this situation - but
you'd still need to generate the bounds info regardless.​


SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz,
'2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM
generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1
hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN [...]
OR end BETWEEN [...]) - range logic will be better and you may want to
adjust the upper bound by negative 1 (nano-second?) to allow for easier
"<=" logic if using BETWEEN.

Thanks, that is very helpful, but are you sure CROSS JOIN is what you
wanted here? using that, I get a 625 row result set where each row from the
first SELECT is paired up with EVERY row from the second select. I would
think I would want the first row of the first SELECT paired up with only
the first row of the second, second row of the first paired with the second
row of the second, etc - i.e. 24 start and end bounds. Or am I missing
something?

​No, I rushed things...:( Sorry. My concept is good though but indeed you
want to end up with a table having only 24 rows (for the sample).

LATERAL may work here but I haven't had a chance to play with it yet. A
simple ordinal column to join on would be sufficient.

David J.​

#11Israel Brewster
israel@ravnalaska.net
In reply to: Adrian Klaver (#9)
Re: Group by range in hour of day

Attachments:

Israel Brewster.vcftext/directory; name="Israel Brewster.vcf"Download
#12Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Israel Brewster (#11)
Re: Group by range in hour of day

test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;

h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2

Note if you always want all 24 rows with a count of 0 when appropriate
(which seems common in reports with tables or plots), you can just tweak
the above query to use a left join: FROM generate_series(0, 23) AS s(h)
LEFT OUTER JOIN start_end ON h BETWEEN ...

Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Israel Brewster
israel@ravnalaska.net
In reply to: Paul Jungwirth (#12)
Re: Group by range in hour of day

On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:

test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;

h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2

Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or plots), you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN start_end ON h BETWEEN ...

Paul

Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same example that came up with this result set:

Given this:

test=> select * from start_end ;
id | start_time | end_time
----+------------------------+------------------------
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07

The specified query returns this:

h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2

Which is an excellent start, but I also need one more column, which is the total "active" time per hour. So given the intermediate result of this:

id | start_time | end_time | h
----+------------------------+------------------------+----
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11

I'd want a final result of this:

h | count | sum
----+---------------
8 | 2 | 1.75 (or 1:45:00 or whatever)
9 | 3 | 2.33 (2:20:00)
10 | 2 | 2.00 (2:00:00)
11 | 2 | 0.83 (0:50:00)

Where the 1:45 in the 8 hour is based on 45 minutes from row id 1 [8:15-9:00) plus the full hour [08:00-9:00) from row id 3, the hour 9 value is based on the amount of rows 1,2 and 3 that fall within the 9 hour, etc.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14David G. Johnston
david.g.johnston@gmail.com
In reply to: Israel Brewster (#13)
Re: Group by range in hour of day

On Tuesday, March 17, 2015, Israel Brewster <israel@ravnalaska.net> wrote:

On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com

<javascript:;>> wrote:

test=> select h, count(*) from start_end, generate_series(0, 23) as

s(h) where h between extract(hour from start_time) and extract(hour from
end_time) group by h order by h;

h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2

Note if you always want all 24 rows with a count of 0 when appropriate

(which seems common in reports with tables or plots), you can just tweak
the above query to use a left join: FROM generate_series(0, 23) AS s(h)
LEFT OUTER JOIN start_end ON h BETWEEN ...

Paul

Right, thanks. That makes sense. So next question: how do I get the
"active" time per hour from this? To use the same example that came up with
this result set:

Which is why you do not (only?) want to convert your data to hour-of-day
but want to create timestamp end points. Then you simply do timestamp
subtraction to get durations which you can then sum together.

David J.

#15Israel Brewster
israel@ravnalaska.net
In reply to: David G. Johnston (#14)
Re: Group by range in hour of day

On Mar 17, 2015, at 9:05 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tuesday, March 17, 2015, Israel Brewster <israel@ravnalaska.net <mailto:israel@ravnalaska.net>> wrote:

On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com <javascript:;>> wrote:

test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;

h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2

Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or plots), you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN start_end ON h BETWEEN ...

Paul

Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same example that came up with this result set:

Which is why you do not (only?) want to convert your data to hour-of-day but want to create timestamp end points. Then you simply do timestamp subtraction to get durations which you can then sum together.

Well, it's not QUITE that simple. For example, row id 3 which starts at 08:00:00 and ends at 11:45:00 in the example. If I have a timestamp endpoint of 10:00:00 for the 9 hour, and I just do simple timestamp subtraction, I'll get an interval of 2 (10:00:00 - 08:00:00), which is not correct since there can't be more than an hour in any given hour. Similarly for the 11 hour and either of the two matching rows - since they end during the hour in question (row 2 only contributes 5 minutes), I'd actually need to subtract the end_time from the start point in that case to get the time.

That said, the concept is sound, and I am fairly sure I can make it work using a case when statement to handle the various permutations of starting before and or ending after the hour in question. I'll work on that, but if there is a more elegant solution, I'm all ears :-)

Show quoted text

David J.

#16Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Israel Brewster (#13)
Re: Group by range in hour of day

So next question: how do I get the "active" time per hour from this?

I think you just SUM() over the intersection between each hourly window
and each event, right? This might be easiest using tsrange, something
like this:

SUM(extract(minutes from (tsrange(start_time, end_time) &&
tsrange(h, h + interval '1 hour'))::interval))

I think you'll have to implement ::interval yourself though, e.g. here:

http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange

Also as mentioned you'll have to convert h from an integer [0,23] to a
timestamp, but that seems pretty easy. Assuming start_time and end_time
are UTC that's just adding that many hours to UTC midnight of the same day.

Some weird edge cases to be careful about: activities that cross
midnight. Activities that last more than one full day, e.g. start 3/15
and end 3/17.

Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17Israel Brewster
israel@ravnalaska.net
In reply to: Paul Jungwirth (#16)
Re: Group by range in hour of day

On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:

So next question: how do I get the "active" time per hour from this?

I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest using tsrange, something like this:

Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.

SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))

I think you'll have to implement ::interval yourself though, e.g. here:

http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange

Gotcha

Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy. Assuming start_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day.

Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full day, e.g. start 3/15 and end 3/17.

Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Israel Brewster (#17)
Re: Group by range in hour of day

On 03/17/2015 10:57 AM, Israel Brewster wrote:

On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:

So next question: how do I get the "active" time per hour from this?

I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest using tsrange, something like this:

Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.

SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))

I think you'll have to implement ::interval yourself though, e.g. here:

http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange

Gotcha

My take on this is using CASE.

Rough sketch:

WHEN
date_trunc('hour', end_time) < h
THEN
end_time - start_time
ELSE
(date_trunc('hour', start_time) + interval '1 hr') - start_time
as
active_time

Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy. Assuming start_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day.

Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full day, e.g. start 3/15 and end 3/17.

Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Israel Brewster (#17)
Re: Group by range in hour of day

Some weird edge cases to be careful about: activities that cross midnight.
Activities that last more than one full day,
e.g. start 3/15 and end 3/17.

Right. And I will run into some of those (at least the crossing midnight),
so I'll keep an eye out.

If you are running the report on more than one day at a time, I think
David Johnston is right that you want to convert from integers [0, 23]
to timestamps as soon as possible, possibly even just generate a series
of timestamps rather than integers right from the beginning. Also beware
of extract(hour from foo). Probably you want tsrange intersection as
your join condition rather than BETWEEN.

Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Adrian Klaver (#18)
Re: Group by range in hour of day

--
Adrian Klaver
adrian.klaver@aklaver.com

On Tue, Mar 17, 2015, at 11:30 AM, Adrian Klaver wrote:

On 03/17/2015 10:57 AM, Israel Brewster wrote:

On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:

So next question: how do I get the "active" time per hour from this?

I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest using tsrange, something like this:

Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.

SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))

I think you'll have to implement ::interval yourself though, e.g. here:

http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange

Gotcha

My take on this is using CASE.

Rough sketch:

WHEN
date_trunc('hour', end_time) < h
THEN
end_time - start_time
ELSE
(date_trunc('hour', start_time) + interval '1 hr') - start_time
as
active_time

Aah, should be

WHEN
date_trunc('hour', end_time) < h + 1
THEN
end_time - start_time
ELSE
(date_trunc('hour', start_time) + interval '1 hr') - start_time
as
active_time

Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy. Assuming start_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day.

Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full day, e.g. start 3/15 and end 3/17.

Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21Marc Mamin
M.Mamin@intershop.de
In reply to: Adrian Klaver (#20)
#22Marc Mamin
M.Mamin@intershop.de
In reply to: Marc Mamin (#21)
#23Israel Brewster
israel@ravnalaska.net
In reply to: Marc Mamin (#22)