Good candidate query for window syntax?

Started by Ketemaover 15 years ago5 messagesgeneral
Jump to latest
#1Ketema
ketema@gmail.com

Hello, I have a table defined as:

CREATE TABLE demo AS
(
id serial PRIMARY KEY,
start_time timestamp without timezone,
duration integer
)

A sample data set I am working with is:

start_time | duration | end_time
---------------------+----------+---------------------
2006-08-28 16:55:11 | 94 | 2006-08-28 16:56:45
2006-08-28 16:56:00 | 63 | 2006-08-28 16:57:03
2006-08-28 16:56:02 | 25 | 2006-08-28 16:56:27
2006-08-28 16:56:20 | 11 | 2006-08-28 16:56:31
2006-08-28 16:56:20 | 76 | 2006-08-28 16:57:36
2006-08-28 16:56:29 | 67 | 2006-08-28 16:57:36
2006-08-28 16:56:45 | 21 | 2006-08-28 16:57:06
2006-08-28 16:56:50 | 44 | 2006-08-28 16:57:34
2006-08-28 16:56:50 | 36 | 2006-08-28 16:57:26
2006-08-28 16:56:53 | 26 | 2006-08-28 16:57:19
2006-08-28 16:56:57 | 55 | 2006-08-28 16:57:52
2006-08-28 16:57:28 | 1 | 2006-08-28 16:57:29
2006-08-28 16:57:42 | 17 | 2006-08-28 16:57:59
2006-08-28 16:57:46 | 28 | 2006-08-28 16:58:14
2006-08-28 16:58:25 | 51 | 2006-08-28 16:59:16
2006-08-28 16:58:31 | 20 | 2006-08-28 16:58:51
2006-08-28 16:58:35 | 27 | 2006-08-28 16:59:02

generated by the query:
SELECT start_time, duration, to_timestamp((extract(epoch from start_time) + duration))::timestamp as end_time
FROM demo
ORDER BY start_time, duration, 3;

My goal is: To find the maximum number of concurrent rows over an arbitrary interval. Concurrent is defined as overlapping in their duration. Example from the set above: Assume the desired interval is one day. Rows 1 and 2 are concurrent because row 2's start_time is within the duration of row 1. If you go through the set the max concurrency is 5 (this is a guess cause I did it visually and may have miscounted). I took a scan of how I tried to solve it manually and attached the image. I tried using timelines to visualize the start, duration, and end of each row then looked for where they overlapped.

Attachments:

intervals.jpegimage/jpeg; name=intervals.jpegDownload
#2Jorge Godoy
jgodoy@gmail.com
In reply to: Ketema (#1)
Re: Good candidate query for window syntax?

Have you checked the OVERLAPS operator in the documentation?

http://www.postgresql.org/docs/8.4/interactive/functions-datetime.html

--
Jorge Godoy <jgodoy@gmail.com>

On Fri, Sep 10, 2010 at 10:03, Ketema Harris <ketema@gmail.com> wrote:

Show quoted text

Hello, I have a table defined as:

CREATE TABLE demo AS
(
id serial PRIMARY KEY,
start_time timestamp without timezone,
duration integer
)

A sample data set I am working with is:

start_time | duration | end_time
---------------------+----------+---------------------
2006-08-28 16:55:11 | 94 | 2006-08-28 16:56:45
2006-08-28 16:56:00 | 63 | 2006-08-28 16:57:03
2006-08-28 16:56:02 | 25 | 2006-08-28 16:56:27
2006-08-28 16:56:20 | 11 | 2006-08-28 16:56:31
2006-08-28 16:56:20 | 76 | 2006-08-28 16:57:36
2006-08-28 16:56:29 | 67 | 2006-08-28 16:57:36
2006-08-28 16:56:45 | 21 | 2006-08-28 16:57:06
2006-08-28 16:56:50 | 44 | 2006-08-28 16:57:34
2006-08-28 16:56:50 | 36 | 2006-08-28 16:57:26
2006-08-28 16:56:53 | 26 | 2006-08-28 16:57:19
2006-08-28 16:56:57 | 55 | 2006-08-28 16:57:52
2006-08-28 16:57:28 | 1 | 2006-08-28 16:57:29
2006-08-28 16:57:42 | 17 | 2006-08-28 16:57:59
2006-08-28 16:57:46 | 28 | 2006-08-28 16:58:14
2006-08-28 16:58:25 | 51 | 2006-08-28 16:59:16
2006-08-28 16:58:31 | 20 | 2006-08-28 16:58:51
2006-08-28 16:58:35 | 27 | 2006-08-28 16:59:02

generated by the query:
SELECT start_time, duration, to_timestamp((extract(epoch from start_time) +
duration))::timestamp as end_time
FROM demo
ORDER BY start_time, duration, 3;

My goal is: To find the maximum number of concurrent rows over an arbitrary
interval. Concurrent is defined as overlapping in their duration. Example
from the set above: Assume the desired interval is one day. Rows 1 and 2
are concurrent because row 2's start_time is within the duration of row 1.
If you go through the set the max concurrency is 5 (this is a guess cause I
did it visually and may have miscounted). I took a scan of how I tried to
solve it manually and attached the image. I tried using timelines to
visualize the start, duration, and end of each row then looked for where
they overlapped.

My desired output set would be:

max_concurrency | interval (in this case grouped by day)
--------------------+-----------------
5 | 2006-08-28

if the interval for this set were different, say 30 minutes, then I would
expect to see something like:
max_concurrency | interval
--------------------+--------------------------------------------
0 | 2006-08-28 00:00:00 - 2006-08-28 00:29:59
0 | 2006-08-28 00:30:00 - 2006-08-28 00:59:59
0 | 2006-08-28 01:00:00 - 2006-08-28 01:29:59
.......continues.....
0 | 2006-08-28 16:00:00 - 2006-08-28 16:29:59
5 | 2006-08-28 16:30:00 - 2006-08-28 16:59:59

I think that a query that involves a window could be used to solve this
question as the documentation says:
"A window function call represents the application of an aggregate-like
function over some portion of the rows selected by a query...the window
function is able to scan all the rows that would be part of the current
row's group according to the grouping specification...."
I am hoping that someone with more experience could help devise a way to do
this with a query. Thanks in advance.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Ketema
ketema@gmail.com
In reply to: Ketema (#1)
Re: Good candidate query for window syntax?

On Sep 10, 9:08 am, jgo...@gmail.com (Jorge Godoy) wrote:

Have you checked the OVERLAPS operator in the documentation?

http://www.postgresql.org/docs/8.4/interactive/functions-datetime.html

--
Jorge Godoy     <jgo...@gmail.com>

On Fri, Sep 10, 2010 at 10:03, Ketema Harris <ket...@gmail.com> wrote:

Hello,  I have a table defined as:

CREATE TABLE demo AS
(
       id serial PRIMARY KEY,
       start_time timestamp without timezone,
       duration integer
)

A sample data set I am working with is:

     start_time      | duration |       end_time
---------------------+----------+---------------------
 2006-08-28 16:55:11 |       94 | 2006-08-28 16:56:45
 2006-08-28 16:56:00 |       63 | 2006-08-28 16:57:03
 2006-08-28 16:56:02 |       25 | 2006-08-28 16:56:27
 2006-08-28 16:56:20 |       11 | 2006-08-28 16:56:31
 2006-08-28 16:56:20 |       76 | 2006-08-28 16:57:36
 2006-08-28 16:56:29 |       67 | 2006-08-28 16:57:36
 2006-08-28 16:56:45 |       21 | 2006-08-28 16:57:06
 2006-08-28 16:56:50 |       44 | 2006-08-28 16:57:34
 2006-08-28 16:56:50 |       36 | 2006-08-28 16:57:26
 2006-08-28 16:56:53 |       26 | 2006-08-28 16:57:19
 2006-08-28 16:56:57 |       55 | 2006-08-28 16:57:52
 2006-08-28 16:57:28 |        1 | 2006-08-28 16:57:29
 2006-08-28 16:57:42 |       17 | 2006-08-28 16:57:59
 2006-08-28 16:57:46 |       28 | 2006-08-28 16:58:14
 2006-08-28 16:58:25 |       51 | 2006-08-28 16:59:16
 2006-08-28 16:58:31 |       20 | 2006-08-28 16:58:51
 2006-08-28 16:58:35 |       27 | 2006-08-28 16:59:02

generated by the query:
SELECT start_time, duration, to_timestamp((extract(epoch from start_time) +
duration))::timestamp as end_time
FROM demo
ORDER BY start_time, duration, 3;

My goal is: To find the maximum number of concurrent rows over an arbitrary
interval.  Concurrent is defined as overlapping in their duration.  Example
from the set above: Assume the desired interval is one day.  Rows 1 and 2
are concurrent because row 2's start_time is within the duration of row 1.
 If you go through the set the max concurrency is 5 (this is a guess cause I
did it visually and may have miscounted). I took a scan of how I tried to
solve it manually and attached the image.  I tried using timelines to
visualize the start, duration, and end of each row then looked for where
they overlapped.

My desired output set would be:

max_concurrency     |     interval   (in this case grouped by day)
--------------------+-----------------
       5           |   2006-08-28

if the interval for this set were different, say 30 minutes, then I would
expect to see something like:
max_concurrency     |     interval
--------------------+--------------------------------------------
       0           |   2006-08-28 00:00:00 - 2006-08-28 00:29:59
       0           |   2006-08-28 00:30:00 - 2006-08-28 00:59:59
       0           |   2006-08-28 01:00:00 - 2006-08-28 01:29:59
                       .......continues.....
       0           |   2006-08-28 16:00:00 - 2006-08-28 16:29:59
       5           |   2006-08-28 16:30:00 - 2006-08-28 16:59:59

I think that a query that involves a window could be used to solve this
question as the documentation says:
"A window function call represents the application of an aggregate-like
function over some portion of the rows selected by a query...the window
function is able to scan all the rows that would be part of the current
row's group according to the grouping specification...."
I am hoping that someone with more experience could help devise a way to do
this with a query.  Thanks in advance.

--
Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I have just reviewed and thanks for reminding me that exists. I can
see that this could be useful, but I'm not quite putting it all
together. The overlaps operator takes two sets of start and end times
and tells you if they overlap, i need this comparison done over an
entire set......I started writing the following:

SELECT count(case when (start_time, end_time) overlaps ? is true then
1) as max_concurrency OVER w,
calldate::date as "interval"
OVER (PARTITION BY start_time::date ORDER BY start_time, end_time
desc) AS w
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
FROM cdr;

What would go on the right side of overlaps ?

I really have no idea if that is the right direction....

#4Ketema
ketema@gmail.com
In reply to: Ketema (#3)
Re: Good candidate query for window syntax?

Ok I have been flailing at trying to understand both syntax and concepts...I
think I am moving forward as I have a query that returns a result...its just
the wrong result....

SELECT count(*) OVER w as max_concurrency,
start_time::date as "interval"
FROM demo
GROUP BY start_time::date,
case
when
(
(start_time, to_timestamp((extract(epoch from start_time)
+ duration))::timestamp)
OVERLAPS
(start_time, to_timestamp((extract(epoch from start_time)
+ duration))::timestamp)
) = TRUE
then 1
end
WINDOW w AS
(
PARTITION BY
start_time::date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

The results are :

max_concurrency | interval
-----------------+--------------------------
1 | 2006-08-28
1 | 2010-09-09
1 | 2010-09-10
(3 rows)

The count is returning the count of the date, not if a rows interval
overlaps another rows. Also I recognize that I really want the max count
over the given interval. Any thoughts would be appreciated

On Fri, Sep 10, 2010 at 9:40 AM, Ketema <ketema@gmail.com> wrote:

Show quoted text

On Sep 10, 9:08 am, jgo...@gmail.com (Jorge Godoy) wrote:

Have you checked the OVERLAPS operator in the documentation?

http://www.postgresql.org/docs/8.4/interactive/functions-datetime.html

--
Jorge Godoy <jgo...@gmail.com>

On Fri, Sep 10, 2010 at 10:03, Ketema Harris <ket...@gmail.com> wrote:

Hello, I have a table defined as:

CREATE TABLE demo AS
(
id serial PRIMARY KEY,
start_time timestamp without timezone,
duration integer
)

A sample data set I am working with is:

start_time | duration | end_time
---------------------+----------+---------------------
2006-08-28 16:55:11 | 94 | 2006-08-28 16:56:45
2006-08-28 16:56:00 | 63 | 2006-08-28 16:57:03
2006-08-28 16:56:02 | 25 | 2006-08-28 16:56:27
2006-08-28 16:56:20 | 11 | 2006-08-28 16:56:31
2006-08-28 16:56:20 | 76 | 2006-08-28 16:57:36
2006-08-28 16:56:29 | 67 | 2006-08-28 16:57:36
2006-08-28 16:56:45 | 21 | 2006-08-28 16:57:06
2006-08-28 16:56:50 | 44 | 2006-08-28 16:57:34
2006-08-28 16:56:50 | 36 | 2006-08-28 16:57:26
2006-08-28 16:56:53 | 26 | 2006-08-28 16:57:19
2006-08-28 16:56:57 | 55 | 2006-08-28 16:57:52
2006-08-28 16:57:28 | 1 | 2006-08-28 16:57:29
2006-08-28 16:57:42 | 17 | 2006-08-28 16:57:59
2006-08-28 16:57:46 | 28 | 2006-08-28 16:58:14
2006-08-28 16:58:25 | 51 | 2006-08-28 16:59:16
2006-08-28 16:58:31 | 20 | 2006-08-28 16:58:51
2006-08-28 16:58:35 | 27 | 2006-08-28 16:59:02

generated by the query:
SELECT start_time, duration, to_timestamp((extract(epoch from

start_time) +

duration))::timestamp as end_time
FROM demo
ORDER BY start_time, duration, 3;

My goal is: To find the maximum number of concurrent rows over an

arbitrary

interval. Concurrent is defined as overlapping in their duration.

Example

from the set above: Assume the desired interval is one day. Rows 1 and

2

are concurrent because row 2's start_time is within the duration of row

1.

If you go through the set the max concurrency is 5 (this is a guess

cause I

did it visually and may have miscounted). I took a scan of how I tried

to

solve it manually and attached the image. I tried using timelines to
visualize the start, duration, and end of each row then looked for

where

they overlapped.

My desired output set would be:

max_concurrency | interval (in this case grouped by day)
--------------------+-----------------
5 | 2006-08-28

if the interval for this set were different, say 30 minutes, then I

would

expect to see something like:
max_concurrency | interval
--------------------+--------------------------------------------
0 | 2006-08-28 00:00:00 - 2006-08-28 00:29:59
0 | 2006-08-28 00:30:00 - 2006-08-28 00:59:59
0 | 2006-08-28 01:00:00 - 2006-08-28 01:29:59
.......continues.....
0 | 2006-08-28 16:00:00 - 2006-08-28 16:29:59
5 | 2006-08-28 16:30:00 - 2006-08-28 16:59:59

I think that a query that involves a window could be used to solve this
question as the documentation says:
"A window function call represents the application of an aggregate-like
function over some portion of the rows selected by a query...the window
function is able to scan all the rows that would be part of the current
row's group according to the grouping specification...."
I am hoping that someone with more experience could help devise a way

to do

this with a query. Thanks in advance.

--
Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I have just reviewed and thanks for reminding me that exists. I can
see that this could be useful, but I'm not quite putting it all
together. The overlaps operator takes two sets of start and end times
and tells you if they overlap, i need this comparison done over an
entire set......I started writing the following:

SELECT count(case when (start_time, end_time) overlaps ? is true then
1) as max_concurrency OVER w,
calldate::date as "interval"
OVER (PARTITION BY start_time::date ORDER BY start_time, end_time
desc) AS w
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
FROM cdr;

What would go on the right side of overlaps ?

I really have no idea if that is the right direction....

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Ketema (#1)
Re: Good candidate query for window syntax?

Ketema Harris <ketema@gmail.com> writes:

My goal is: To find the maximum number of concurrent rows over an
arbitrary interval.

My guess is that the following would help you:

http://wiki.postgresql.org/wiki/Range_aggregation

--
dim