obtain the difference between successive rows
Hi,
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:
It is complicate and very long to perform. The problem could be simply
solved with MySql by creating a new field and updating it using a statement
with @.
I beleive this is a common problem for users. Do you know a simpler solution
with postgreSql?
--
View this message in context: http://postgresql.1045698.n5.nabble.com/obtain-the-difference-between-successive-rows-tp5729149.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 20/10/2012 11:54, ochaussavoine wrote:
Hi,
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:It is complicate and very long to perform. The problem could be simply
solved with MySql by creating a new field and updating it using a statement
with @.
I beleive this is a common problem for users. Do you know a simpler solution
with postgreSql?
I think you can do it with a window function.
http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html
Ray.
--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie
On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 20/10/2012 11:54, ochaussavoine wrote:
Hi,
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:I think you can do it with a window function.
http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.htmlRay.
In particular you're looking probably for the lag() window function.
For example if you have a timestamp column "ts" that's increasing
monotonically and you want to check the difference of each row's
timestamp with the chronologically previous row's timestamp you'd do
something like:
$ SELECT id, ts, lag(ts) OVER (order by ts) AS prev_ts FROM mytable;
This will display as third column the previous row's ts.
You may find reading this introduction to window fuctions useful:
https://www.pgcon.org/2009/schedule/attachments/98_Windowing%20Functions.pdf
best tregards,
Thalis
Thalis Kalfigkopoulos wrote:
On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 20/10/2012 11:54, ochaussavoine wrote:
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:I think you can do it with a window function.
In particular you're looking probably for the lag() window function.
What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:
CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);
with sample data:
'2012-09-07',2158
'2012-10-05',3018
and I want an output such as:
Meter Read on October 5
Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860
Number service days = 28
I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.
Thalis Kalfigkopoulos wrote:
On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 20/10/2012 11:54, ochaussavoine wrote:
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:I think you can do it with a window function.
In particular you're looking probably for the lag() window function.
What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:
CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);
with sample data:
'2012-09-07',2158
'2012-10-05',3018
and I want an output such as:
Meter Read on October 5
Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860
Number service days = 28
I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.
On Sun, Oct 21, 2012 at 2:30 AM, Berend Tober <btober@broadstripe.net> wrote:
What about if there is more than one column you want the difference for (...
coincidentally I am writing a article on this topic right now! ...), say a
table which is used to record a metered quantity at not-quite regular
intervals:
...I am working on a write-up of a neat solution using CTE's, but would be
interested in other's views.
To be quite honest, I would simply read the table directly and then do
the processing in an application language :) But two window functions
should do the trick. Whether or not it's actually more efficient that
way is another question.
ChrisA
What about if there is more than one column you want the difference for (...
coincidentally I am writing a article on this topic right now! ...), say a
table which is used to record a metered quantity at not-quite regular
intervals:CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);with sample data:
'2012-09-07',2158
'2012-10-05',3018and I want an output such as:
Meter Read on October 5
Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860Number service days = 28
No problem with that either.
$ SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
BY current_reading_date) AS kWh_diff, extract('days' FROM
current_reading_date - lag(current_reading_date) OVER(ORDER BY
current_reading_date)) as num_service_days FROM mytable;
Note how ORDER BY is in both cases done by current_reading_date. This
is because the current_reading_date defines the concept of
previous/next row whose values (either current_meter_reading or
current_reading_date) I want to be comparing.
regards,
Thalis
Thalis Kalfigkopoulos wrote:
SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
BY current_reading_date) AS kWh_diff, extract('days' FROM
current_reading_date - lag(current_reading_date) OVER(ORDER BY
current_reading_date)) as num_service_days FROM mytable;
How would you get the previous reading (and perhaps the previous
read date) to also appear on the same output row? The sample
table with the subtraction I showed for illustration is literally
what is printed on the bill ... they are not just presenting the
quantity used and the number of days, but actually the dates and
meter readings used to do the arithmetic.
On 20/10/2012 17:02, Berend Tober wrote:
Thalis Kalfigkopoulos wrote:
SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
BY current_reading_date) AS kWh_diff, extract('days' FROM
current_reading_date - lag(current_reading_date) OVER(ORDER BY
current_reading_date)) as num_service_days FROM mytable;How would you get the previous reading (and perhaps the previous read
date) to also appear on the same output row? The sample table with the
subtraction I showed for illustration is literally what is printed on
the bill ... they are not just presenting the quantity used and the
number of days, but actually the dates and meter readings used to do the
arithmetic.
Just include them in the SELECT:
SELECT
lag(current_meter_reading) OVER(ORDER BY current_reading_date) AS
kWh_prev,
current_meter_reading - lag(current_meter_reading) OVER(ORDER BY
current_reading_date) AS kWh_diff,
(...etc...)
Ray.
--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie
Chris Angelico <rosuav@gmail.com> writes:
To be quite honest, I would simply read the table directly and then do
the processing in an application language :) But two window functions
should do the trick. Whether or not it's actually more efficient that
way is another question.
FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.
Having said that, they are pretty expensive. I tend to agree that doing
the processing on the application side might be faster --- but only if
you've got a place to put such code there. If you've just got generic
query-result display code, there may not be any convenient way to do it.
regards, tom lane
On 20/10/2012 17:23, Tom Lane wrote:
Chris Angelico <rosuav@gmail.com> writes:
To be quite honest, I would simply read the table directly and then do
the processing in an application language :) But two window functions
should do the trick. Whether or not it's actually more efficient that
way is another question.FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.
Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?
Having said that, they are pretty expensive. I tend to agree that doing
the processing on the application side might be faster --- but only if
you've got a place to put such code there. If you've just got generic
True, assuming that you're working with a language that handles dates
well... I do a lot of PHP, and have found that it's generally safer to
handle date arithmetic in Postges.
Ray.
--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie
Raymond O'Donnell wrote:
On 20/10/2012 17:02, Berend Tober wrote:
Thalis Kalfigkopoulos wrote:
How would you get the previous reading (and perhaps the previous read
date) to also appear ...Just include them in the SELECT:
Well, that is surprisingly easy!
How about this then: the table includes data for more than one
meter. I moved (I've been keeping this data for two decades ...
yes, I know...) to a new house, and in the new house, the utility
company has replaced the meter (one of those "smart" meters). So
the table has a foreign key reference to the primary key
identifying the meter:
CREATE TABLE electricity
(
electric_meter_pk integer,
current_reading_date date,
current_meter_reading integer
);
with sample data:
2 | 1997-04-14 | 0
2 | 1997-05-08 | 573
2 | 1997-06-12 | 1709
...
2 | 2009-09-14 |152941
3 | 2009-06-26 | 68502
3 | 2009-08-13 | 69738
...
3 | 2012-07-06 |118953
3 | 2012-07-18 |119185
4 | 2012-07-18 | 0
4 | 2012-08-06 | 887
4 | 2012-09-07 | 2158
4 | 2012-10-05 | 3018
Your suggestion almost worked as is for this, except that you
have to note that reading for meter #2 and meter #3 overlap (I
briefly owned two houses), and that seemed to confuse the lag()
function:
SELECT
electric_meter_pk,
lag(reading_date)
OVER(ORDER BY reading_date) as prev_date,
reading_date,
lag(meter_reading)
OVER(ORDER BY reading_date) AS prev_reading,
meter_reading,
meter_reading - lag(meter_reading)
OVER(ORDER BY reading_date) AS kWh_diff,
reading_date - lag(reading_date)
OVER(ORDER BY reading_date) as num_service_days
FROM electric
order by 1,3;
2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 | 1179 |32
2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 | 1365 |30
2 | 2009-06-26 | 2009-07-14 | 68502 | 149808 | 81306 |18
2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 |29
2 | 2009-09-12 | 2009-09-14 | 70934 | 152941 | 82007 | 2
3 | 2009-06-10 | 2009-06-26 | 148139 | 68502 |-79637 |16
3 | 2009-08-12 | 2009-08-13 | 151584 | 69738 |-81846 | 1
3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 |30
3 | 2009-09-14 | 2009-10-14 | 152941 | 71918 |-81023 |30
3 | 2009-10-14 | 2009-11-11 | 71918 | 72952 | 1034 |28
On Sun, Oct 21, 2012 at 3:29 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 20/10/2012 17:23, Tom Lane wrote:
Having said that, they are pretty expensive. I tend to agree that doing
the processing on the application side might be faster --- but only if
you've got a place to put such code there. If you've just got genericTrue, assuming that you're working with a language that handles dates
well... I do a lot of PHP, and have found that it's generally safer to
handle date arithmetic in Postges.
That's because PHP is a sucky language :)
For a job like this, I'd probably whip up something in Pike. Excellent
string handling, easy database access (including an efficient
implementation of the pgsql protocol - skips the usual underlying
library and talks directly to the server), and the best Unicode
support I've seen in any application language (now equalled by Python,
as of version 3.3 that just came out). Biggest downside is that it's a
tad obscure.
object db=Sql.Sql("pgsql://username:password@hostname/database");
db->query("select blah from blah");
ChrisA
"Raymond O'Donnell" <rod@iol.ie> writes:
On 20/10/2012 17:23, Tom Lane wrote:
FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.
Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?
Not sure what you're proposing? I don't see how you'd solve this
problem with a CTE, at least not without a join, which seems unlikely
to be a win.
regards, tom lane
Berend Tober wrote:
Raymond O'Donnell wrote:
On 20/10/2012 17:02, Berend Tober wrote:
Thalis Kalfigkopoulos wrote:
How would you get the previous reading (and perhaps the
previous read
date) to also appear ...Just include them in the SELECT:
Well, that is surprisingly easy!
How about this then: the table includes data for more than one
meter....
Almost answering my own question. Adding the meter key to the lag:
SELECT
electric_meter_pk,
lag(reading_date)
OVER(ORDER BY electric_meter_pk,reading_date)
as prev_date,
reading_date,
lag(meter_reading)
OVER(ORDER BY electric_meter_pk,reading_date)
AS prev_reading,
meter_reading,
meter_reading - lag(meter_reading)
OVER(ORDER BY electric_meter_pk,reading_date)
AS kWh_diff,
reading_date - lag(reading_date)
OVER(ORDER BY electric_meter_pk,reading_date)
as num_service_days
FROM home.electric
order by 1,3;
Gives all good as far as lining up dates, except it does not
cross the new-meter boundary gracefully:
2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 | 29
2 | 2009-08-12 | 2009-09-14 | 151584 | 152941 | 1357 | 33
*3 | 2009-09-14 | 2009-06-26 | 152941 | 68502 | -84439 |-80
3 | 2009-06-26 | 2009-08-13 | 68502 | 69738 | 1236 | 48
3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 | 30
...
3 | 2012-05-04 | 2012-06-07 | 116091 | 117469 | 1378 | 34
3 | 2012-06-07 | 2012-07-06 | 117469 | 118953 | 1484 | 29
3 | 2012-07-06 | 2012-07-18 | 118953 | 119185 | 232 | 12
*4 | 2012-07-18 | 2012-07-18 | 119185 | 0 | -119185 | 0
4 | 2012-07-18 | 2012-08-06 | 0 | 887 | 887 | 19
4 | 2012-08-06 | 2012-09-07 | 887 | 2158 | 1271 | 32
4 | 2012-09-07 | 2012-10-05 | 2158 | 3018 | 860 | 28
The first-row-initialization problem is what lead me to consider
a recursive CTE. I have something that works and does not use
window functions, but I think it requires more detailed
explanation than I have prepared at this time.
On 20/10/2012 17:50, Tom Lane wrote:
"Raymond O'Donnell" <rod@iol.ie> writes:
On 20/10/2012 17:23, Tom Lane wrote:
FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?Not sure what you're proposing? I don't see how you'd solve this
problem with a CTE, at least not without a join, which seems unlikely
to be a win.
Yes, I see what you mean.... was waving my hands a bit. :-)
Ray.
--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie
2012/10/20 Berend Tober <btober@broadstripe.net>:
Your suggestion almost worked as is for this, except that you have to note
that reading for meter #2 and meter #3 overlap (I briefly owned two houses),
and that seemed to confuse the lag() function:SELECT
electric_meter_pk,
lag(reading_date)
OVER(ORDER BY reading_date) as prev_date,
reading_date,
lag(meter_reading)
OVER(ORDER BY reading_date) AS prev_reading,
meter_reading,
meter_reading - lag(meter_reading)
OVER(ORDER BY reading_date) AS kWh_diff,
reading_date - lag(reading_date)
OVER(ORDER BY reading_date) as num_service_days
FROM electric
order by 1,3;2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 | 1179 |32
2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 | 1365 |30
2 | 2009-06-26 | 2009-07-14 | 68502 | 149808 | 81306 |18
2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 |29
2 | 2009-09-12 | 2009-09-14 | 70934 | 152941 | 82007 | 2
3 | 2009-06-10 | 2009-06-26 | 148139 | 68502 |-79637 |16
3 | 2009-08-12 | 2009-08-13 | 151584 | 69738 |-81846 | 1
3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 |30
3 | 2009-09-14 | 2009-10-14 | 152941 | 71918 |-81023 |30
3 | 2009-10-14 | 2009-11-11 | 71918 | 72952 | 1034 |28
You can do
… OVER(PARTITION BY electric_meter_pk ORDER BY reading_date)
to split you data by meter.
--
Victor Y. Yegorov
Виктор Егоров wrote:
2012/10/20 Berend Tober <btober@broadstripe.net>:
Your suggestion almost worked as is for this, except that you have to note
that reading for meter #2 and meter #3 overlap ...You can do
… OVER(PARTITION BY electric_meter_pk ORDER BY reading_date)
to split you data by meter.
That looks like it works great! Much simpler-looking SQL than what I was
working on.
Thanks!
On 2012-10-20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Raymond O'Donnell" <rod@iol.ie> writes:
On 20/10/2012 17:23, Tom Lane wrote:
FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?Not sure what you're proposing? I don't see how you'd solve this
problem with a CTE, at least not without a join, which seems unlikely
to be a win.
select generate_series(1,20) as a ,
( generate_series(1,20)*9+random()*7)::int as b
into temp table foo;
delete from foo where random()>0.5;
with
j as ( select f,lag(f) over (order by f.a)as g from foo as f )
select (f).*,(g).a as "lag a",(g).b as "lag b",
(f).a-(g).a as "diff(a)" ,(f).b-(g).b as "diff(b)" from j;
--
⚂⚃ 100% natural
On 2012-10-20, Berend Tober <btober@broadstripe.net> wrote:
Thalis Kalfigkopoulos wrote:
On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 20/10/2012 11:54, ochaussavoine wrote:
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:I think you can do it with a window function.
In particular you're looking probably for the lag() window function.
What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:CREATE TABLE electricity
(
current_reading_date date,
current_meter_reading integer
);with sample data:
'2012-09-07',2158
'2012-10-05',3018and I want an output such as:
Meter Read on October 5
Current Previous kWh
Reading Reading Used
-----------------------------------
3018 - 2158 = 860Number service days = 28
I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.
electricity meter may bis a bad example as usage meters often have
fewer digits than are needed to track all historical usage
eg:
'2012-05-07',997743
'2012-06-06',999601
'2012-07-05',000338
'2012-08-06',001290
'2012-09-07',002158
'2012-10-05',003018
--
⚂⚃ 100% natural