time series data

Started by Khalil Khamlichiover 8 years ago12 messagesgeneral

Jump to latest

Khalil Khamlichi

khamlichi.khalil@gmail.com

over 8 years ago

Hi everyone,

I have a data stream of a call center application coming in to postgres in
this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used to
calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort of
data streams ?

Thanks in advance.

Melvin Davidson

melvin6925@gmail.com

over 8 years ago

In reply to: Khalil Khamlichi (#1)

Re: time series data

On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com

wrote:

Hi everyone,

I have a data stream of a call center application coming in to postgres
in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used to
calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort of
data streams ?

Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
call_status varchar(10) NOT NULL,
CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_time timestamp NOT NULL,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username name NOT NULL,
session_id serial NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user
connects (call_status = 'ready'.
Probably best to use a BEFORE trigger to do this, but you will need to
code it yourself.

2. You can then do

SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status =
'talking'),
( SELECT call_time FROM current_session WHERE call_status =
'after_call')
) as duration
FROM user_sessions
WHERE username = '*actual_user_name*'
AND session_id = *actual_session_id*;

You can use similar queries for avg and frequency.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Clifford Snow

clifford@snowandsnow.us

over 8 years ago

In reply to: Melvin Davidson (#2)

Re: time series data

I have a stream that updates every minute with a trigger that updates
another table with information from the stream. That way I'm constantly
updated with no need to run a script to update before I want a report.

Clifford

On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com>
wrote:

On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <
khamlichi.khalil@gmail.com> wrote:

Hi everyone,

I have a data stream of a call center application coming in to postgres
in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used to
calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort of
data streams ?

Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
call_status varchar(10) NOT NULL,
CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_time timestamp NOT NULL,
CONSTRAINT user_sessions_pk PRIMARY KEY (username,
session_id,call_status),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username name NOT NULL,
session_id serial NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user
connects (call_status = 'ready'.
Probably best to use a BEFORE trigger to do this, but you will need to
code it yourself.

2. You can then do

SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status =
'talking'),
( SELECT call_time FROM current_session WHERE call_status =
'after_call')
) as duration
FROM user_sessions
WHERE username = '*actual_user_name*'
AND session_id = *actual_session_id*;

You can use similar queries for avg and frequency.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
@osm_seattle
osm_seattle.snowandsnow.us
OpenStreetMap: Maps with a human touch

Melvin Davidson

melvin6925@gmail.com

over 8 years ago

In reply to: Clifford Snow (#3)

Re: time series data

On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us>
wrote:

I have a stream that updates every minute with a trigger that updates
another table with information from the stream. That way I'm constantly
updated with no need to run a script to update before I want a report.

Clifford

On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com>
wrote:

On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <
khamlichi.khalil@gmail.com> wrote:

Hi everyone,

I have a data stream of a call center application coming in to postgres
in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used to
calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort of
data streams ?

Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
call_status varchar(10) NOT NULL,
CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_time timestamp NOT NULL,
CONSTRAINT user_sessions_pk PRIMARY KEY (username,
session_id,call_status),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username name NOT NULL,
session_id serial NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user
connects (call_status = 'ready'.
Probably best to use a BEFORE trigger to do this, but you will need to
code it yourself.

2. You can then do

SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status =
'talking'),
( SELECT call_time FROM current_session WHERE call_status =
'after_call')
) as duration
FROM user_sessions
WHERE username = '*actual_user_name*'
AND session_id = *actual_session_id*;

You can use similar queries for avg and frequency.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
@osm_seattle
osm_seattle.snowandsnow.us
OpenStreetMap: Maps with a human touch

I thought about the table design for user_sessions and came up
with a better one:

CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_ready timestamp NOT NULL,
call_talking timestamp,
call_after_call timestamp,
call_duration interval,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);

So in essence, when the call starts, just do:

INSERT INTO user_sessions
(username, call_ready)
VALUES
('actual_user_name', now() );

Then
SELECT max(session_id) AS current_session
FROM user_sessions
WHERE username = 'actual_user_name';

When talking starts:
UPDATE user_sessions
SET call_status = 'talking',
call_talking = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;

When call ends:
UPDATE user_sessions
SET call_status = 'after_call',
call_after_call = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;

Now all you have to do to get call length is:

SELECT username,
age ( call_after_call, call_talking ) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = current_session;

Scott Marlowe

scott.marlowe@gmail.com

over 8 years ago

In reply to: Khalil Khamlichi (#1)

Re: time series data

On Sun, Oct 1, 2017 at 2:17 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:

Hi everyone,

I have a data stream of a call center application coming in to postgres in
this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used to
calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort of
data streams ?

Have you looked at temporal_tables extension? It seems custom made for
what you're trying to do.

http://clarkdave.net/2015/02/historical-records-with-postgresql-and-temporal-tables-and-sql-2011/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Khalil Khamlichi

khamlichi.khalil@gmail.com

over 8 years ago

In reply to: Scott Marlowe (#5)

Re: time series data

interesting proposition, I am reading the docs.

On Mon, Oct 2, 2017 at 6:08 PM, Scott Marlowe <scott.marlowe@gmail.com>
wrote:

Show quoted text

On Sun, Oct 1, 2017 at 2:17 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:

Hi everyone,

I have a data stream of a call center application coming in to postgres

in

this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used to
calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort

of

data streams ?

Have you looked at temporal_tables extension? It seems custom made for
what you're trying to do.

http://clarkdave.net/2015/02/historical-records-with-
postgresql-and-temporal-tables-and-sql-2011/

Khalil Khamlichi

khamlichi.khalil@gmail.com

over 8 years ago

In reply to: Melvin Davidson (#4)

Re: time series data

Hi Melvin,

Thanks a lot for your help,

let me explain to you my problem.

we have records like this

ccdb1=# select user_name, agent_status, event_time from cc_events ;

so user1 was READY at 2017-01-01 10:00:00 then he received a call
that he attended at 2017-01-01 10:02:00 and so on ...
so user1 was ready for 2 minutes, then he was talking for 6 minutes
then he was in after_call (doing after call work) for 2 minutes and
this is the kind of information we want to query.

my solution so far that I came with, is in my table I have 1 more
field : end_time
so when an event comes in and before the insert I do :
update cc_events set end_time = current_timestamp where user_name =
'user_of_event' and end_time is null;

then I insert new event leaving the end_time as null so that next
event will update it and so on.

its working fine, I have the start and end times for each event, its
not too painful to query (sum(end-start) while grouping by user_name,
agent_status), but its one more update on the table and also limited
in what you can query about,

I know this must be a common problem in every software that deals with
events, so I suppose something is already built-in in postgres to deal
with it.

I looked at your solution, it's very clever and we use something
similar but on another module where we manage live calls and route
them to available agents.

kkh

On Mon, Oct 2, 2017 at 4:06 PM, Melvin Davidson <melvin6925@gmail.com> wrote:

On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us> wrote:

I have a stream that updates every minute with a trigger that updates another table with information from the stream. That way I'm constantly updated with no need to run a script to update before I want a report.

Clifford

On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:

On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:

Hi everyone,

I have a data stream of a call center application coming in to postgres in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?

Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
call_status varchar(10) NOT NULL,
CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_time timestamp NOT NULL,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username name NOT NULL,
session_id serial NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.

2. You can then do

SELECT username,
age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
( SELECT call_time FROM current_session WHERE call_status = 'after_call')
) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = actual_session_id;

You can use similar queries for avg and frequency.

--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
@osm_seattle
osm_seattle.snowandsnow.us
OpenStreetMap: Maps with a human touch

I thought about the table design for user_sessions and came up
with a better one:

CREATE TABLE user_sessions
(
username name NOT NULL,
session_id bigint NOT NULL,
call_status varchar(10) NOT NULL,
call_ready timestamp NOT NULL,
call_talking timestamp,
call_after_call timestamp,
call_duration interval,
CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
REFERENCES status(call_status)
);

So in essence, when the call starts, just do:

INSERT INTO user_sessions
(username, call_ready)
VALUES
('actual_user_name', now() );

Then
SELECT max(session_id) AS current_session
FROM user_sessions
WHERE username = 'actual_user_name';

When talking starts:
UPDATE user_sessions
SET call_status = 'talking',
call_talking = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;

When call ends:
UPDATE user_sessions
SET call_status = 'after_call',
call_after_call = now()
WHERE username = 'actual_user_name'
AND session_id = current_session;

Now all you have to do to get call length is:

SELECT username,
age ( call_after_call, call_talking ) as duration
FROM user_sessions
WHERE username = 'actual_user_name'
AND session_id = current_session;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Nico Williams

nico@cryptonector.com

over 8 years ago

In reply to: Khalil Khamlichi (#1)

Re: time series data

You have these choices:

- turn events into INSERTs and UPDATES on a table that represents a
single call

You might have an events VIEW with INSTED OF insert/update triggers
so you can insert events as the interface for updating calls.

- store the events and have a VIEW on the events table that gives you
rows that summarize each call

- both: store the events and the summaries of the calls

You might have an events table with AFTER INSERT triggers to insert
or update the corresponding rows in the calls table.

Nico
--

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeremy Schneider

schneider@ardentperf.com

over 8 years ago

In reply to: Khalil Khamlichi (#7)

Re: time series data

On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:

we have records like this

ccdb1=# select user_name, agent_status, event_time from cc_events ;

user_name | agent_status | event_time
-----------+--------------+---------------------
user1 | ready | 2017-01-01 10:00:00
user1 | talking | 2017-01-01 10:02:00
user1 | after_call | 2017-01-01 10:08:00
user1 | ready | 2017-01-01 10:10:00
user1 | talking | 2017-01-01 10:12:00
user1 | after_call | 2017-01-01 10:15:00
user1 | paused | 2017-01-01 10:17:00
user1 | ready | 2017-01-01 10:25:00
user1 | talking | 2017-01-01 10:26:00
(9 rows)

so user1 was READY at 2017-01-01 10:00:00 then he received a call
that he attended at 2017-01-01 10:02:00 and so on ...
so user1 was ready for 2 minutes, then he was talking for 6 minutes
then he was in after_call (doing after call work) for 2 minutes and
this is the kind of information we want to query.

my solution so far that I came with, is in my table I have 1 more
field : end_time
so when an event comes in and before the insert I do :
update cc_events set end_time = current_timestamp where user_name =
'user_of_event' and end_time is null;

then I insert new event leaving the end_time as null so that next
event will update it and so on.

its working fine, I have the start and end times for each event, its
not too painful to query (sum(end-start) while grouping by user_name,
agent_status), but its one more update on the table and also limited
in what you can query about,

I know this must be a common problem in every software that deals with
events, so I suppose something is already built-in in postgres to deal
with it.

Khalil, changing your schema is one solution with certain benefits -
but it's definitely not necessary when you have the power of
PostgreSQL at your fingertips. You can solve your problem without
changing anything at all. :)

All you need is a window function:
https://www.postgresql.org/docs/9.6/static/tutorial-window.html

Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and
9.3.17 (all the latest minors currently supported on RDS). You can
try these queries on your own system; they should work anywhere.

First, I added a second user to your data set to make sure we were
handling that case correctly.

==========
create table cc_events (user_name varchar(8), agent_status
varchar(20), event_time timestamp);

insert into cc_events values
('user1', 'ready', '2017-01-01 10:00:00'),
('user1', 'talking', '2017-01-01 10:02:00'),
('user2', 'ready', '2017-01-01 10:04:00'),
('user2', 'talking', '2017-01-01 10:05:00'),
('user1', 'after_call', '2017-01-01 10:07:00'),
('user1', 'ready', '2017-01-01 10:08:00'),
('user1', 'talking', '2017-01-01 10:10:00'),
('user1', 'after_call', '2017-01-01 10:15:00'),
('user2', 'after_call', '2017-01-01 10:18:00'),
('user1', 'paused', '2017-01-01 10:20:00'),
('user2', 'paused', '2017-01-01 10:21:00');

select * from cc_events order by user_name, event_time;

==========
here's a basic window function in action:

==========
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time
from cc_events order by event_time;

==========
and now we just add one more column which does the subtraction to
calculate the duration:

==========
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time,
(lead(event_time) over (partition by user_name order by event_time))
- event_time as duration
from cc_events order by event_time;

user_name | agent_status | event_time | next_event_time | duration
-----------+--------------+---------------------+---------------------+----------
user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 | 00:02:00
user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00
user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00
user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 | 00:13:00
user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 | 00:01:00
user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00
user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 | 00:05:00
user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 | 00:05:00
user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00
user1 | paused | 2017-01-01 10:20:00 | |
user2 | paused | 2017-01-01 10:21:00 | |

==========
it might also be convenient to wrap the window function in a common
table expression

https://www.postgresql.org/docs/9.6/static/queries-with.html

==========
with calculate_next_events as (
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time
from cc_events order by event_time
)
select user_name, agent_status, next_event_time-event_time duration
from calculate_next_events order by event_time;

==========
Finally, if you really want to supercharge this and power-up even
more, besides temporal databases you might check out this recent blog
post about implementing a state machine in postgresql... it's really
interesting and closely related to what you're solving.

https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html

Hope this is helpful. Great to see that you're working on PostgreSQL -
it's a powerful engine to build with!

-Jeremy

--
http://about.me/jeremy_schneider

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Joshua D. Drake

jd@commandprompt.com

over 8 years ago

In reply to: Khalil Khamlichi (#1)

Re: time series data

On 10/01/2017 01:17 AM, Khalil Khamlichi wrote:

Hi everyone,

Take a look at TimescaleDB they have an extension to Postgres that makes
this awesome (and yes its free and open source).

I have a data stream of a call center application coming in to postgres
in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used
to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort
of data streams ?

Thanks in advance.

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Khalil Khamlichi

khamlichi.khalil@gmail.com

over 8 years ago

In reply to: Joshua D. Drake (#10)

Re: time series data

Thanks, I'll check it out.

Sent via mobile, please forgive typos and brevity

On Oct 14, 2017 3:23 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:

Show quoted text

On 10/01/2017 01:17 AM, Khalil Khamlichi wrote:

Hi everyone,

Take a look at TimescaleDB they have an extension to Postgres that makes
this awesome (and yes its free and open source).

jD

I have a data stream of a call center application coming in to postgres
in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of
that event and also the end_time of the previous one so should be used to
calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration,
frequency, avg ...etc , does any body have an experience with this sort of
data streams ?

Thanks in advance.

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

#12

Khalil Khamlichi

khamlichi.khalil@gmail.com

over 8 years ago

In reply to: Jeremy Schneider (#9)

Re: time series data

Thanks a lot Jeremy, we ended up integrating the code you provided
into our software (just before you patent it) :)

Best regards,

Kkh

On Tue, Oct 3, 2017 at 7:58 PM, Schneider <schneider@ardentperf.com> wrote:

On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:

we have records like this

ccdb1=# select user_name, agent_status, event_time from cc_events ;

user_name | agent_status | event_time
-----------+--------------+---------------------
user1 | ready | 2017-01-01 10:00:00
user1 | talking | 2017-01-01 10:02:00
user1 | after_call | 2017-01-01 10:08:00
user1 | ready | 2017-01-01 10:10:00
user1 | talking | 2017-01-01 10:12:00
user1 | after_call | 2017-01-01 10:15:00
user1 | paused | 2017-01-01 10:17:00
user1 | ready | 2017-01-01 10:25:00
user1 | talking | 2017-01-01 10:26:00
(9 rows)

so user1 was READY at 2017-01-01 10:00:00 then he received a call
that he attended at 2017-01-01 10:02:00 and so on ...
so user1 was ready for 2 minutes, then he was talking for 6 minutes
then he was in after_call (doing after call work) for 2 minutes and
this is the kind of information we want to query.

my solution so far that I came with, is in my table I have 1 more
field : end_time
so when an event comes in and before the insert I do :
update cc_events set end_time = current_timestamp where user_name =
'user_of_event' and end_time is null;

then I insert new event leaving the end_time as null so that next
event will update it and so on.

its working fine, I have the start and end times for each event, its
not too painful to query (sum(end-start) while grouping by user_name,
agent_status), but its one more update on the table and also limited
in what you can query about,

I know this must be a common problem in every software that deals with
events, so I suppose something is already built-in in postgres to deal
with it.

Khalil, changing your schema is one solution with certain benefits -
but it's definitely not necessary when you have the power of
PostgreSQL at your fingertips. You can solve your problem without
changing anything at all. :)

All you need is a window function:
https://www.postgresql.org/docs/9.6/static/tutorial-window.html

Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and
9.3.17 (all the latest minors currently supported on RDS). You can
try these queries on your own system; they should work anywhere.

First, I added a second user to your data set to make sure we were
handling that case correctly.

==========
create table cc_events (user_name varchar(8), agent_status
varchar(20), event_time timestamp);

insert into cc_events values
('user1', 'ready', '2017-01-01 10:00:00'),
('user1', 'talking', '2017-01-01 10:02:00'),
('user2', 'ready', '2017-01-01 10:04:00'),
('user2', 'talking', '2017-01-01 10:05:00'),
('user1', 'after_call', '2017-01-01 10:07:00'),
('user1', 'ready', '2017-01-01 10:08:00'),
('user1', 'talking', '2017-01-01 10:10:00'),
('user1', 'after_call', '2017-01-01 10:15:00'),
('user2', 'after_call', '2017-01-01 10:18:00'),
('user1', 'paused', '2017-01-01 10:20:00'),
('user2', 'paused', '2017-01-01 10:21:00');

select * from cc_events order by user_name, event_time;

==========
here's a basic window function in action:

==========
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time
from cc_events order by event_time;

user_name | agent_status | event_time | next_event_time
-----------+--------------+---------------------+---------------------
user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00
user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00
user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00
user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00
user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00
user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00
user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00
user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00
user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00
user1 | paused | 2017-01-01 10:20:00 |
user2 | paused | 2017-01-01 10:21:00 |

==========
and now we just add one more column which does the subtraction to
calculate the duration:

==========
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time,
(lead(event_time) over (partition by user_name order by event_time))
- event_time as duration
from cc_events order by event_time;

user_name | agent_status | event_time | next_event_time | duration
-----------+--------------+---------------------+---------------------+----------
user1 | ready | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 | 00:02:00
user1 | talking | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00
user2 | ready | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00
user2 | talking | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 | 00:13:00
user1 | after_call | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 | 00:01:00
user1 | ready | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00
user1 | talking | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 | 00:05:00
user1 | after_call | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 | 00:05:00
user2 | after_call | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00
user1 | paused | 2017-01-01 10:20:00 | |
user2 | paused | 2017-01-01 10:21:00 | |

==========
it might also be convenient to wrap the window function in a common
table expression

https://www.postgresql.org/docs/9.6/static/queries-with.html

==========
with calculate_next_events as (
select user_name, agent_status, event_time,
lead(event_time) over (partition by user_name order by event_time)
next_event_time
from cc_events order by event_time
)
select user_name, agent_status, next_event_time-event_time duration
from calculate_next_events order by event_time;

==========
Finally, if you really want to supercharge this and power-up even
more, besides temporal databases you might check out this recent blog
post about implementing a state machine in postgresql... it's really
interesting and closely related to what you're solving.

https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html

Hope this is helpful. Great to see that you're working on PostgreSQL -
it's a powerful engine to build with!

-Jeremy

--
http://about.me/jeremy_schneider

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general