Group by range in hour of day
I have a table with two timestamp columns for the start time and end time of each record (call them start and end).I'm trying to figure out if there is a way to group these records by "hour of day", that is the record should be included in the group if the hour of the day for the group falls anywhere in the range [start,end]. Obviously each record may well fall into multiple groups under this scenario.
The goal here is to figure out, for each hour of the day, a) what is the total number of "active" records for that hour, and b) what is the total "active" time for those records during the hour, with an ultimate goal of figuring out the average active time per record per hour.
So, for simplified example, if the table contained three records:
start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30
Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5
I can then easily manipulate these values to get my ultimate goal of the average, which would of course always be less than or equal to 1. Is this doable in postgress? Or would it be a better idea to simply pull the raw data and post-process in code? Thanks.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Attachments:
Israel Brewster.vcftext/directory; name="Israel Brewster.vcf"Download
On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.So, for simplified example, if the table contained three records:
start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.
Do not have an answer for you, but a question:
What version of Postgres are you on?
This will help determine what tools are available to work with.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day",
I think you can do this by selecting `FROM generate_series(0, 23) s(h)`
and then joining to your table based on `h BETWEEN start AND end`.
Whenever I need to write a time-series aggregate query I reach for
generate_series. Mostly that's so I have output rows even when COUNT(*)
would be 0, but here it also means that a row from your data can feed
into multiple output rows.
I could probably write this out in more detail if you like, but that's
the short version. :-)
Good luck!
Paul
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.So, for simplified example, if the table contained three records:
start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.Do not have an answer for you, but a question:
What version of Postgres are you on?
This will help determine what tools are available to work with.
The following will give you endpoints for your bounds. Version is
important since "range types" could be very useful in this situation - but
you'd still need to generate the bounds info regardless.
SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz,
'2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM
generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1
hour'::interval) e (end_ts)) AS e
You would join this using an ON condition with an OR (start BETWEEN [...]
OR end BETWEEN [...]) - range logic will be better and you may want to
adjust the upper bound by negative 1 (nano-second?) to allow for easier
"<=" logic if using BETWEEN.
David J.
On Mar 16, 2015, at 2:13 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.So, for simplified example, if the table contained three records:
start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.Do not have an answer for you, but a question:
What version of Postgres are you on?
This will help determine what tools are available to work with.
Oh, right. Of course. I'm on 9.4.0
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-------------------------------------------------
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mar 16, 2015, at 2:16 PM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day",I think you can do this by selecting `FROM generate_series(0, 23) s(h)` and then joining to your table based on `h BETWEEN start AND end`.
Whenever I need to write a time-series aggregate query I reach for generate_series. Mostly that's so I have output rows even when COUNT(*) would be 0, but here it also means that a row from your data can feed into multiple output rows.
I could probably write this out in more detail if you like, but that's the short version. :-)
I think I can work with that :-) Hadn't considered doing a join there, so that's a new approach I can investigate. Thanks!
Good luck!
Paul
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Assuming 3 things
Table name - test
Column names - start_time, end_time
Added an id column (int) to distinguish each record in the table
You can go with this..... (my apologies for formatting issues)
with
slots as (
select *
from generate_series(0,1439) as s(slot)
),
slots_hours as (
select slot,
slot / 60 as hour
from slots
),
minutes as (
select id,
date_part('hour', start_time) * 60 + date_part('minute',
start_time) as start_minute,
date_part('hour', end_time) * 60 + date_part('minute',
end_time) as end_minute
from test
),
minute_slots as (
select id,
slot,
hour
from minutes
join slots_hours
on minutes.start_minute <= slots_hours.slot
and minutes.end_minute > slots_hours.slot
)
select hour,
count(*) / 60.0 as sum,
count(distinct id) as count
from minute_slots
group by hour
I'm certain there are more elegant solutions possible - but you can grasp
each step this way.
John
On Mon, Mar 16, 2015 at 2:57 PM, Israel Brewster <israel@ravnalaska.net>
wrote:
Show quoted text
I have a table with two timestamp columns for the start time and end time
of each record (call them start and end).I'm trying to figure out if there
is a way to group these records by "hour of day", that is the record should
be included in the group if the hour of the day for the group falls
anywhere in the range [start,end]. Obviously each record may well fall into
multiple groups under this scenario.The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal of
figuring out the average active time per record per hour.So, for simplified example, if the table contained three records:
start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is this
doable in postgress? Or would it be a better idea to simply pull the raw
data and post-process in code? Thanks.-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
On Mar 16, 2015, at 2:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.So, for simplified example, if the table contained three records:
start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.Do not have an answer for you, but a question:
What version of Postgres are you on?
This will help determine what tools are available to work with.
The following will give you endpoints for your bounds. Version is important since "range types" could be very useful in this situation - but you'd still need to generate the bounds info regardless.
SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) e (end_ts)) AS eYou would join this using an ON condition with an OR (start BETWEEN [...] OR end BETWEEN [...]) - range logic will be better and you may want to adjust the upper bound by negative 1 (nano-second?) to allow for easier "<=" logic if using BETWEEN.
Thanks, that is very helpful, but are you sure CROSS JOIN is what you wanted here? using that, I get a 625 row result set where each row from the first SELECT is paired up with EVERY row from the second select. I would think I would want the first row of the first SELECT paired up with only the first row of the second, second row of the first paired with the second row of the second, etc - i.e. 24 start and end bounds. Or am I missing something?
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Show quoted text
David J.
On 03/16/2015 04:16 PM, Israel Brewster wrote:
On Mar 16, 2015, at 2:22 PM, David G. Johnston
<david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>wrote:On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time
and end
time of each record (call them start and end).I'm trying to
figure out
if there is a way to group these records by "hour of day",
that is the
record should be included in the group if the hour of the day
for the
group falls anywhere in the range [start,end]. Obviously each
record may
well fall into multiple groups under this scenario.The goal here is to figure out, for each hour of the day, a)
what is the
total number of "active" records for that hour, and b) what is
the total
"active" time for those records during the hour, with an
ultimate goal
of figuring out the average active time per record per hour.So, for simplified example, if the table contained three records:
start | end
------------------------------__-----------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5I can then easily manipulate these values to get my ultimate
goal of the
average, which would of course always be less than or equal to
1. Is
this doable in postgress? Or would it be a better idea to
simply pull
the raw data and post-process in code? Thanks.Do not have an answer for you, but a question:
What version of Postgres are you on?
This will help determine what tools are available to work with.
The following will give you endpoints for your bounds. Version is
important since "range types" could be very useful in this situation -
but you'd still need to generate the bounds info regardless.
SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz,
'2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM
generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
'1 hour'::interval) e (end_ts)) AS eYou would join this using an ON condition with an OR (start BETWEEN
[...] OR end BETWEEN [...]) - range logic will be better and you may
want to adjust the upper bound by negative 1 (nano-second?) to allow
for easier "<=" logic if using BETWEEN.Thanks, that is very helpful, but are you sure CROSS JOIN is what you
wanted here? using that, I get a 625 row result set where each row from
the first SELECT is paired up with EVERY row from the second select. I
would think I would want the first row of the first SELECT paired up
with only the first row of the second, second row of the first paired
with the second row of the second, etc - i.e. 24 start and end bounds.
Or am I missing something?
Given this:
test=> select * from start_end ;
id | start_time | end_time
----+------------------------+------------------------
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07
using Pauls hints I got:
test=> select * from start_end, generate_series(0, 23) as s(h) where h
between extract(hour from start_time) and extract(hour from end_time) ;
id | start_time | end_time | h
----+------------------------+------------------------+----
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11
test=> select h, count(*) from start_end, generate_series(0, 23) as s(h)
where h between extract(hour from start_time) and extract(hour from
end_time) group by h order by h;
h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------David J.
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Mar 16, 2015 at 4:16 PM, Israel Brewster <israel@ravnalaska.net>
wrote:
On Mar 16, 2015, at 2:22 PM, David G. Johnston <david.g.johnston@gmail.com>
wrote:On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.So, for simplified example, if the table contained three records:
start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30Then the results should break out something like this:
hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.Do not have an answer for you, but a question:
What version of Postgres are you on?
This will help determine what tools are available to work with.
The following will give you endpoints for your bounds. Version is
important since "range types" could be very useful in this situation - but
you'd still need to generate the bounds info regardless.
SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz,
'2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM
generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1
hour'::interval) e (end_ts)) AS eYou would join this using an ON condition with an OR (start BETWEEN [...]
OR end BETWEEN [...]) - range logic will be better and you may want to
adjust the upper bound by negative 1 (nano-second?) to allow for easier
"<=" logic if using BETWEEN.Thanks, that is very helpful, but are you sure CROSS JOIN is what you
wanted here? using that, I get a 625 row result set where each row from the
first SELECT is paired up with EVERY row from the second select. I would
think I would want the first row of the first SELECT paired up with only
the first row of the second, second row of the first paired with the second
row of the second, etc - i.e. 24 start and end bounds. Or am I missing
something?
No, I rushed things...:( Sorry. My concept is good though but indeed you
want to end up with a table having only 24 rows (for the sample).
LATERAL may work here but I haven't had a chance to play with it yet. A
simple ordinal column to join on would be sufficient.
David J.
Attachments:
Israel Brewster.vcftext/directory; name="Israel Brewster.vcf"Download
test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;
h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2
Note if you always want all 24 rows with a count of 0 when appropriate
(which seems common in reports with tables or plots), you can just tweak
the above query to use a left join: FROM generate_series(0, 23) AS s(h)
LEFT OUTER JOIN start_end ON h BETWEEN ...
Paul
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;
h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or plots), you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN start_end ON h BETWEEN ...
Paul
Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same example that came up with this result set:
Given this:
test=> select * from start_end ;
id | start_time | end_time
----+------------------------+------------------------
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07
The specified query returns this:
h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2
Which is an excellent start, but I also need one more column, which is the total "active" time per hour. So given the intermediate result of this:
id | start_time | end_time | h
----+------------------------+------------------------+----
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8
1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11
I'd want a final result of this:
h | count | sum
----+---------------
8 | 2 | 1.75 (or 1:45:00 or whatever)
9 | 3 | 2.33 (2:20:00)
10 | 2 | 2.00 (2:00:00)
11 | 2 | 0.83 (0:50:00)
Where the 1:45 in the 8 hour is based on 45 minutes from row id 1 [8:15-9:00) plus the full hour [08:00-9:00) from row id 3, the hour 9 value is based on the amount of rows 1,2 and 3 that fall within the 9 hour, etc.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tuesday, March 17, 2015, Israel Brewster <israel@ravnalaska.net> wrote:
On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com
<javascript:;>> wrote:
test=> select h, count(*) from start_end, generate_series(0, 23) as
s(h) where h between extract(hour from start_time) and extract(hour from
end_time) group by h order by h;h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2Note if you always want all 24 rows with a count of 0 when appropriate
(which seems common in reports with tables or plots), you can just tweak
the above query to use a left join: FROM generate_series(0, 23) AS s(h)
LEFT OUTER JOIN start_end ON h BETWEEN ...Paul
Right, thanks. That makes sense. So next question: how do I get the
"active" time per hour from this? To use the same example that came up with
this result set:
Which is why you do not (only?) want to convert your data to hour-of-day
but want to create timestamp end points. Then you simply do timestamp
subtraction to get durations which you can then sum together.
David J.
On Mar 17, 2015, at 9:05 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, March 17, 2015, Israel Brewster <israel@ravnalaska.net <mailto:israel@ravnalaska.net>> wrote:
On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com <javascript:;>> wrote:
test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;
h | count
----+-------
8 | 2
9 | 3
10 | 2
11 | 2Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or plots), you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN start_end ON h BETWEEN ...
Paul
Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same example that came up with this result set:
Which is why you do not (only?) want to convert your data to hour-of-day but want to create timestamp end points. Then you simply do timestamp subtraction to get durations which you can then sum together.
Well, it's not QUITE that simple. For example, row id 3 which starts at 08:00:00 and ends at 11:45:00 in the example. If I have a timestamp endpoint of 10:00:00 for the 9 hour, and I just do simple timestamp subtraction, I'll get an interval of 2 (10:00:00 - 08:00:00), which is not correct since there can't be more than an hour in any given hour. Similarly for the 11 hour and either of the two matching rows - since they end during the hour in question (row 2 only contributes 5 minutes), I'd actually need to subtract the end_time from the start point in that case to get the time.
That said, the concept is sound, and I am fairly sure I can make it work using a case when statement to handle the various permutations of starting before and or ending after the hour in question. I'll work on that, but if there is a more elegant solution, I'm all ears :-)
Show quoted text
David J.
So next question: how do I get the "active" time per hour from this?
I think you just SUM() over the intersection between each hourly window
and each event, right? This might be easiest using tsrange, something
like this:
SUM(extract(minutes from (tsrange(start_time, end_time) &&
tsrange(h, h + interval '1 hour'))::interval))
I think you'll have to implement ::interval yourself though, e.g. here:
http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
Also as mentioned you'll have to convert h from an integer [0,23] to a
timestamp, but that seems pretty easy. Assuming start_time and end_time
are UTC that's just adding that many hours to UTC midnight of the same day.
Some weird edge cases to be careful about: activities that cross
midnight. Activities that last more than one full day, e.g. start 3/15
and end 3/17.
Paul
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
So next question: how do I get the "active" time per hour from this?
I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest using tsrange, something like this:
Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
I think you'll have to implement ::interval yourself though, e.g. here:
http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
Gotcha
Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy. Assuming start_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day.
Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full day, e.g. start 3/15 and end 3/17.
Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
Paul
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 03/17/2015 10:57 AM, Israel Brewster wrote:
On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
So next question: how do I get the "active" time per hour from this?
I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest using tsrange, something like this:
Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
I think you'll have to implement ::interval yourself though, e.g. here:
http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
Gotcha
My take on this is using CASE.
Rough sketch:
WHEN
date_trunc('hour', end_time) < h
THEN
end_time - start_time
ELSE
(date_trunc('hour', start_time) + interval '1 hr') - start_time
as
active_time
Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy. Assuming start_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day.
Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full day, e.g. start 3/15 and end 3/17.
Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Some weird edge cases to be careful about: activities that cross midnight.
Activities that last more than one full day,
e.g. start 3/15 and end 3/17.Right. And I will run into some of those (at least the crossing midnight),
so I'll keep an eye out.
If you are running the report on more than one day at a time, I think
David Johnston is right that you want to convert from integers [0, 23]
to timestamps as soon as possible, possibly even just generate a series
of timestamps rather than integers right from the beginning. Also beware
of extract(hour from foo). Probably you want tsrange intersection as
your join condition rather than BETWEEN.
Paul
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Adrian Klaver
adrian.klaver@aklaver.com
On Tue, Mar 17, 2015, at 11:30 AM, Adrian Klaver wrote:
On 03/17/2015 10:57 AM, Israel Brewster wrote:
On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
So next question: how do I get the "active" time per hour from this?
I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest using tsrange, something like this:
Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
I think you'll have to implement ::interval yourself though, e.g. here:
http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
Gotcha
My take on this is using CASE.
Rough sketch:
WHEN
date_trunc('hour', end_time) < h
THEN
end_time - start_time
ELSE
(date_trunc('hour', start_time) + interval '1 hr') - start_time
as
active_time
Aah, should be
WHEN
date_trunc('hour', end_time) < h + 1
THEN
end_time - start_time
ELSE
(date_trunc('hour', start_time) + interval '1 hr') - start_time
as
active_time
Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy. Assuming start_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day.
Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full day, e.g. start 3/15 and end 3/17.
Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-------------------------------------------------
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general