obtain the difference between successive rows

Started by Olivier Chaussavoineover 13 years ago21 messagesgeneral
Jump to latest
#1Olivier Chaussavoine
olivier.chaussavoine@gmail.com

Hi,
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:

It is complicate and very long to perform. The problem could be simply
solved with MySql by creating a new field and updating it using a statement
with @.
I beleive this is a common problem for users. Do you know a simpler solution
with postgreSql?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/obtain-the-difference-between-successive-rows-tp5729149.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

In reply to: Olivier Chaussavoine (#1)
Re: obtain the difference between successive rows

On 20/10/2012 11:54, ochaussavoine wrote:

Hi,
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:

It is complicate and very long to perform. The problem could be simply
solved with MySql by creating a new field and updating it using a statement
with @.
I beleive this is a common problem for users. Do you know a simpler solution
with postgreSql?

I think you can do it with a window function.

http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

#3Thalis Kalfigkopoulos
tkalfigo@gmail.com
In reply to: Raymond O'Donnell (#2)
Re: obtain the difference between successive rows

On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:

On 20/10/2012 11:54, ochaussavoine wrote:

Hi,
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:

I think you can do it with a window function.

http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html

Ray.

In particular you're looking probably for the lag() window function.
For example if you have a timestamp column "ts" that's increasing
monotonically and you want to check the difference of each row's
timestamp with the chronologically previous row's timestamp you'd do
something like:
$ SELECT id, ts, lag(ts) OVER (order by ts) AS prev_ts FROM mytable;
This will display as third column the previous row's ts.

You may find reading this introduction to window fuctions useful:
https://www.pgcon.org/2009/schedule/attachments/98_Windowing%20Functions.pdf

best tregards,
Thalis

#4Berend Tober
btober@broadstripe.net
In reply to: Thalis Kalfigkopoulos (#3)
Re: obtain the difference between successive rows

Thalis Kalfigkopoulos wrote:

On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:

On 20/10/2012 11:54, ochaussavoine wrote:

I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:

I think you can do it with a window function.

In particular you're looking probably for the lag() window function.

What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:

CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);

with sample data:

'2012-09-07',2158
'2012-10-05',3018

and I want an output such as:

Meter Read on October 5

Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860

Number service days = 28

I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.

#5Berend Tober
btober@broadstripe.net
In reply to: Thalis Kalfigkopoulos (#3)
Re: obtain the difference between successive rows

Thalis Kalfigkopoulos wrote:

On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:

On 20/10/2012 11:54, ochaussavoine wrote:

I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:

I think you can do it with a window function.

In particular you're looking probably for the lag() window function.

What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:

CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);

with sample data:

'2012-09-07',2158
'2012-10-05',3018

and I want an output such as:

Meter Read on October 5

Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860

Number service days = 28

I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.

#6Chris Angelico
rosuav@gmail.com
In reply to: Berend Tober (#5)
Re: obtain the difference between successive rows

On Sun, Oct 21, 2012 at 2:30 AM, Berend Tober <btober@broadstripe.net> wrote:

What about if there is more than one column you want the difference for (...
coincidentally I am writing a article on this topic right now! ...), say a
table which is used to record a metered quantity at not-quite regular
intervals:
...

I am working on a write-up of a neat solution using CTE's, but would be
interested in other's views.

To be quite honest, I would simply read the table directly and then do
the processing in an application language :) But two window functions
should do the trick. Whether or not it's actually more efficient that
way is another question.

ChrisA

#7Thalis Kalfigkopoulos
tkalfigo@gmail.com
In reply to: Berend Tober (#5)
Re: obtain the difference between successive rows

What about if there is more than one column you want the difference for (...
coincidentally I am writing a article on this topic right now! ...), say a
table which is used to record a metered quantity at not-quite regular
intervals:

CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);

with sample data:

'2012-09-07',2158
'2012-10-05',3018

and I want an output such as:

Meter Read on October 5

Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860

Number service days = 28

No problem with that either.
$ SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
BY current_reading_date) AS kWh_diff, extract('days' FROM
current_reading_date - lag(current_reading_date) OVER(ORDER BY
current_reading_date)) as num_service_days FROM mytable;

Note how ORDER BY is in both cases done by current_reading_date. This
is because the current_reading_date defines the concept of
previous/next row whose values (either current_meter_reading or
current_reading_date) I want to be comparing.

regards,
Thalis

#8Berend Tober
btober@broadstripe.net
In reply to: Thalis Kalfigkopoulos (#7)
Re: obtain the difference between successive rows

Thalis Kalfigkopoulos wrote:

SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
BY current_reading_date) AS kWh_diff, extract('days' FROM
current_reading_date - lag(current_reading_date) OVER(ORDER BY
current_reading_date)) as num_service_days FROM mytable;

How would you get the previous reading (and perhaps the previous
read date) to also appear on the same output row? The sample
table with the subtraction I showed for illustration is literally
what is printed on the bill ... they are not just presenting the
quantity used and the number of days, but actually the dates and
meter readings used to do the arithmetic.

In reply to: Berend Tober (#8)
Re: obtain the difference between successive rows

On 20/10/2012 17:02, Berend Tober wrote:

Thalis Kalfigkopoulos wrote:

SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
BY current_reading_date) AS kWh_diff, extract('days' FROM
current_reading_date - lag(current_reading_date) OVER(ORDER BY
current_reading_date)) as num_service_days FROM mytable;

How would you get the previous reading (and perhaps the previous read
date) to also appear on the same output row? The sample table with the
subtraction I showed for illustration is literally what is printed on
the bill ... they are not just presenting the quantity used and the
number of days, but actually the dates and meter readings used to do the
arithmetic.

Just include them in the SELECT:

SELECT
lag(current_meter_reading) OVER(ORDER BY current_reading_date) AS
kWh_prev,
current_meter_reading - lag(current_meter_reading) OVER(ORDER BY
current_reading_date) AS kWh_diff,

(...etc...)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Angelico (#6)
Re: obtain the difference between successive rows

Chris Angelico <rosuav@gmail.com> writes:

To be quite honest, I would simply read the table directly and then do
the processing in an application language :) But two window functions
should do the trick. Whether or not it's actually more efficient that
way is another question.

FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.

Having said that, they are pretty expensive. I tend to agree that doing
the processing on the application side might be faster --- but only if
you've got a place to put such code there. If you've just got generic
query-result display code, there may not be any convenient way to do it.

regards, tom lane

In reply to: Tom Lane (#10)
Re: obtain the difference between successive rows

On 20/10/2012 17:23, Tom Lane wrote:

Chris Angelico <rosuav@gmail.com> writes:

To be quite honest, I would simply read the table directly and then do
the processing in an application language :) But two window functions
should do the trick. Whether or not it's actually more efficient that
way is another question.

FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.

Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?

Having said that, they are pretty expensive. I tend to agree that doing
the processing on the application side might be faster --- but only if
you've got a place to put such code there. If you've just got generic

True, assuming that you're working with a language that handles dates
well... I do a lot of PHP, and have found that it's generally safer to
handle date arithmetic in Postges.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

#12Berend Tober
btober@broadstripe.net
In reply to: Raymond O'Donnell (#9)
Re: obtain the difference between successive rows

Raymond O'Donnell wrote:

On 20/10/2012 17:02, Berend Tober wrote:

Thalis Kalfigkopoulos wrote:
How would you get the previous reading (and perhaps the previous read
date) to also appear ...

Just include them in the SELECT:

Well, that is surprisingly easy!

How about this then: the table includes data for more than one
meter. I moved (I've been keeping this data for two decades ...
yes, I know...) to a new house, and in the new house, the utility
company has replaced the meter (one of those "smart" meters). So
the table has a foreign key reference to the primary key
identifying the meter:

CREATE TABLE electricity
(
electric_meter_pk integer,
current_reading_date date,
current_meter_reading integer
);

with sample data:

2 | 1997-04-14 | 0
2 | 1997-05-08 | 573
2 | 1997-06-12 | 1709
...
2 | 2009-09-14 |152941
3 | 2009-06-26 | 68502
3 | 2009-08-13 | 69738
...
3 | 2012-07-06 |118953
3 | 2012-07-18 |119185
4 | 2012-07-18 | 0
4 | 2012-08-06 | 887
4 | 2012-09-07 | 2158
4 | 2012-10-05 | 3018

Your suggestion almost worked as is for this, except that you
have to note that reading for meter #2 and meter #3 overlap (I
briefly owned two houses), and that seemed to confuse the lag()
function:

SELECT
electric_meter_pk,
lag(reading_date)
OVER(ORDER BY reading_date) as prev_date,
reading_date,
lag(meter_reading)
OVER(ORDER BY reading_date) AS prev_reading,
meter_reading,
meter_reading - lag(meter_reading)
OVER(ORDER BY reading_date) AS kWh_diff,
reading_date - lag(reading_date)
OVER(ORDER BY reading_date) as num_service_days
FROM electric
order by 1,3;

2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 | 1179 |32
2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 | 1365 |30
2 | 2009-06-26 | 2009-07-14 | 68502 | 149808 | 81306 |18
2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 |29
2 | 2009-09-12 | 2009-09-14 | 70934 | 152941 | 82007 | 2
3 | 2009-06-10 | 2009-06-26 | 148139 | 68502 |-79637 |16
3 | 2009-08-12 | 2009-08-13 | 151584 | 69738 |-81846 | 1
3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 |30
3 | 2009-09-14 | 2009-10-14 | 152941 | 71918 |-81023 |30
3 | 2009-10-14 | 2009-11-11 | 71918 | 72952 | 1034 |28

#13Chris Angelico
rosuav@gmail.com
In reply to: Raymond O'Donnell (#11)
Re: obtain the difference between successive rows

On Sun, Oct 21, 2012 at 3:29 AM, Raymond O'Donnell <rod@iol.ie> wrote:

On 20/10/2012 17:23, Tom Lane wrote:

Having said that, they are pretty expensive. I tend to agree that doing
the processing on the application side might be faster --- but only if
you've got a place to put such code there. If you've just got generic

True, assuming that you're working with a language that handles dates
well... I do a lot of PHP, and have found that it's generally safer to
handle date arithmetic in Postges.

That's because PHP is a sucky language :)

For a job like this, I'd probably whip up something in Pike. Excellent
string handling, easy database access (including an efficient
implementation of the pgsql protocol - skips the usual underlying
library and talks directly to the server), and the best Unicode
support I've seen in any application language (now equalled by Python,
as of version 3.3 that just came out). Biggest downside is that it's a
tad obscure.

object db=Sql.Sql("pgsql://username:password@hostname/database");
db->query("select blah from blah");

ChrisA

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Raymond O'Donnell (#11)
Re: obtain the difference between successive rows

"Raymond O'Donnell" <rod@iol.ie> writes:

On 20/10/2012 17:23, Tom Lane wrote:

FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.

Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?

Not sure what you're proposing? I don't see how you'd solve this
problem with a CTE, at least not without a join, which seems unlikely
to be a win.

regards, tom lane

#15Berend Tober
btober@broadstripe.net
In reply to: Berend Tober (#12)
Re: obtain the difference between successive rows

Berend Tober wrote:

Raymond O'Donnell wrote:

On 20/10/2012 17:02, Berend Tober wrote:

Thalis Kalfigkopoulos wrote:
How would you get the previous reading (and perhaps the
previous read
date) to also appear ...

Just include them in the SELECT:

Well, that is surprisingly easy!

How about this then: the table includes data for more than one
meter....

Almost answering my own question. Adding the meter key to the lag:

SELECT
electric_meter_pk,
lag(reading_date)
OVER(ORDER BY electric_meter_pk,reading_date)
as prev_date,
reading_date,
lag(meter_reading)
OVER(ORDER BY electric_meter_pk,reading_date)
AS prev_reading,
meter_reading,
meter_reading - lag(meter_reading)
OVER(ORDER BY electric_meter_pk,reading_date)
AS kWh_diff,
reading_date - lag(reading_date)
OVER(ORDER BY electric_meter_pk,reading_date)
as num_service_days
FROM home.electric
order by 1,3;

Gives all good as far as lining up dates, except it does not
cross the new-meter boundary gracefully:

2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 | 29
2 | 2009-08-12 | 2009-09-14 | 151584 | 152941 | 1357 | 33
*3 | 2009-09-14 | 2009-06-26 | 152941 | 68502 | -84439 |-80
3 | 2009-06-26 | 2009-08-13 | 68502 | 69738 | 1236 | 48
3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 | 30
...
3 | 2012-05-04 | 2012-06-07 | 116091 | 117469 | 1378 | 34
3 | 2012-06-07 | 2012-07-06 | 117469 | 118953 | 1484 | 29
3 | 2012-07-06 | 2012-07-18 | 118953 | 119185 | 232 | 12
*4 | 2012-07-18 | 2012-07-18 | 119185 | 0 | -119185 | 0
4 | 2012-07-18 | 2012-08-06 | 0 | 887 | 887 | 19
4 | 2012-08-06 | 2012-09-07 | 887 | 2158 | 1271 | 32
4 | 2012-09-07 | 2012-10-05 | 2158 | 3018 | 860 | 28

The first-row-initialization problem is what lead me to consider
a recursive CTE. I have something that works and does not use
window functions, but I think it requires more detailed
explanation than I have prepared at this time.

In reply to: Tom Lane (#14)
Re: obtain the difference between successive rows

On 20/10/2012 17:50, Tom Lane wrote:

"Raymond O'Donnell" <rod@iol.ie> writes:

On 20/10/2012 17:23, Tom Lane wrote:

FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.

Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?

Not sure what you're proposing? I don't see how you'd solve this
problem with a CTE, at least not without a join, which seems unlikely
to be a win.

Yes, I see what you mean.... was waving my hands a bit. :-)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

#17Victor Yegorov
vyegorov@gmail.com
In reply to: Berend Tober (#12)
Re: obtain the difference between successive rows

2012/10/20 Berend Tober <btober@broadstripe.net>:

Your suggestion almost worked as is for this, except that you have to note
that reading for meter #2 and meter #3 overlap (I briefly owned two houses),
and that seemed to confuse the lag() function:

SELECT
electric_meter_pk,
lag(reading_date)
OVER(ORDER BY reading_date) as prev_date,
reading_date,
lag(meter_reading)
OVER(ORDER BY reading_date) AS prev_reading,
meter_reading,
meter_reading - lag(meter_reading)
OVER(ORDER BY reading_date) AS kWh_diff,
reading_date - lag(reading_date)
OVER(ORDER BY reading_date) as num_service_days
FROM electric
order by 1,3;

2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 | 1179 |32
2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 | 1365 |30
2 | 2009-06-26 | 2009-07-14 | 68502 | 149808 | 81306 |18
2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 |29
2 | 2009-09-12 | 2009-09-14 | 70934 | 152941 | 82007 | 2
3 | 2009-06-10 | 2009-06-26 | 148139 | 68502 |-79637 |16
3 | 2009-08-12 | 2009-08-13 | 151584 | 69738 |-81846 | 1
3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 |30
3 | 2009-09-14 | 2009-10-14 | 152941 | 71918 |-81023 |30
3 | 2009-10-14 | 2009-11-11 | 71918 | 72952 | 1034 |28

You can do
… OVER(PARTITION BY electric_meter_pk ORDER BY reading_date)
to split you data by meter.

--
Victor Y. Yegorov

#18Berend Tober
btober@broadstripe.net
In reply to: Victor Yegorov (#17)
Re: obtain the difference between successive rows

Виктор Егоров wrote:

2012/10/20 Berend Tober <btober@broadstripe.net>:

Your suggestion almost worked as is for this, except that you have to note
that reading for meter #2 and meter #3 overlap ...

You can do
… OVER(PARTITION BY electric_meter_pk ORDER BY reading_date)
to split you data by meter.

That looks like it works great! Much simpler-looking SQL than what I was
working on.

Thanks!

#19Jasen Betts
jasen@xnet.co.nz
In reply to: Olivier Chaussavoine (#1)
Re: obtain the difference between successive rows

On 2012-10-20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Raymond O'Donnell" <rod@iol.ie> writes:

On 20/10/2012 17:23, Tom Lane wrote:

FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.

Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?

Not sure what you're proposing? I don't see how you'd solve this
problem with a CTE, at least not without a join, which seems unlikely
to be a win.

select generate_series(1,20) as a ,
( generate_series(1,20)*9+random()*7)::int as b
into temp table foo;
delete from foo where random()>0.5;

with
j as ( select f,lag(f) over (order by f.a)as g from foo as f )
select (f).*,(g).a as "lag a",(g).b as "lag b",
(f).a-(g).a as "diff(a)" ,(f).b-(g).b as "diff(b)" from j;

--
⚂⚃ 100% natural

#20Jasen Betts
jasen@xnet.co.nz
In reply to: Olivier Chaussavoine (#1)
Re: obtain the difference between successive rows

On 2012-10-20, Berend Tober <btober@broadstripe.net> wrote:

Thalis Kalfigkopoulos wrote:

On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:

On 20/10/2012 11:54, ochaussavoine wrote:

I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:

I think you can do it with a window function.

In particular you're looking probably for the lag() window function.

What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:

CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);

with sample data:

'2012-09-07',2158
'2012-10-05',3018

and I want an output such as:

Meter Read on October 5

Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860

Number service days = 28

I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.

electricity meter may bis a bad example as usage meters often have
fewer digits than are needed to track all historical usage

eg:

'2012-05-07',997743
'2012-06-06',999601
'2012-07-05',000338
'2012-08-06',001290
'2012-09-07',002158
'2012-10-05',003018

--
⚂⚃ 100% natural

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jasen Betts (#20)