Window function for get the last value to extend missing rows

Started by Durumdaraalmost 3 years ago6 messagesgeneral
Jump to latest
#1Durumdara
durumdara@gmail.com

Dear Members!

I have a table with temperature measures.
The data is coming from the PLC, but sometimes the period is "slipping", so
the values are not correctly minute based.

03:00 10
03:02 12
03:03 11
03:05 13

I have to make a virtual table which is minute based.

I thought I would make a generated temp table (generate_series) and then
join these values based on minue.

03:00 10
03:01 NULL
03:02 12
03:03 11
03:04 NULL
03:05 13

I need a code to replace the value to the last value on NULL.

03:00 10
03:01 10 <
03:02 12
03:03 11
03:04 11 <
03:05 13

The LAG function seems to be ok, but how to handle if more than two periods
are missing?

03:00 10
03:01 NULL
03:02 NULL
03:03 NULL
03:04 11
03:05 13

select *, coalesce(value, prev_value) as value from (
select mmin, value,
lag(value, 1) over (order by mmin) as prev_value
from test_table) t

or

select *, coalesce(value, prev_value) as value from (
select mmin, value,
coalesce(lag(value, 1) over (order by mmin),
lag(value, 2) over (order by mmin)) as prev_value
from tmp_test_table) t

The test data:

create table tmp_test_table (mmin int, value int);
insert into tmp_test_table values
(1, 1),
(2, 1),
(3, NULL),
(4, NULL),
(5, 2),
(6, NULL),
(7, NULL),
(10, 10),
(11, NULL),
(12, NULL),
(13, NULL),
(14, NULL);

The result is:

1 1 NULL 1
2 1 1 1
3 NULL 1 1
4 NULL 1 1
5 2 NULL 2
6 NULL 2 2
7 NULL 2 2
10 10 NULL 10
11 NULL 10 10
12 NULL 10 10
13 NULL NULL NULL
14 NULL NULL NULL

So you can see, the last values are NULL because the LAG can't use the last
calculated value.

Do you have any idea how to get the last value, doesn't matter how many
NULL-s are in the set?

(15, NULLx20, 10) => 15x21, 10

Thanks

Best regards
dd

#2GF
phabriz@gmail.com
In reply to: Durumdara (#1)
Re: Window function for get the last value to extend missing rows

On Fri, 12 May 2023 at 13:04, Durumdara <durumdara@gmail.com> wrote:

Dear Members!

I have a table with temperature measures.
The data is coming from the PLC, but sometimes the period is "slipping",
so the values are not correctly minute based.

03:00 10
03:02 12
03:03 11
03:05 13

I have to make a virtual table which is minute based.

I thought I would make a generated temp table (generate_series) and then
join these values based on minue.

03:00 10
03:01 NULL
03:02 12
03:03 11
03:04 NULL
03:05 13

I need a code to replace the value to the last value on NULL.

03:00 10
03:01 10 <
03:02 12
03:03 11
03:04 11 <
03:05 13

Unfortunately, as per
https://www.postgresql.org/docs/current/functions-window.html
"The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
lag, first_value, last_value, and nth_value. This is not implemented in
PostgreSQL: the behavior is always the same as the standard's default,
namely RESPECT NULLS".

So, I'd keep the incoming data as is, i.e. with no nulls in values, and
densify it with some generate_series magic:
select gmin as mmin, d.value
from
( select mmin, lead(mmin) over (order by mmin) nextmin, value from
test_table ) d,
generate_series(d.mmin, nextmin - interval'1 minute') gmin

(I assumed a time representation for mmin, but adapt the generate_series
call to whatever your representation is)
Best,
g

#3Thorsten Glaser
tg@evolvis.org
In reply to: GF (#2)
Re: Window function for get the last value to extend missing rows

On Fri, 12 May 2023, GF wrote:

"The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
lag, first_value, last_value, and nth_value. This is not implemented in
PostgreSQL: the behavior is always the same as the standard's default,
namely RESPECT NULLS".

Yeah, THAT caused no small amount of cursing, earlier this year,
I’d have also used IGNORE NULLS somewhere…

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

#4Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Durumdara (#1)
Re: Window function for get the last value to extend missing rows

"Durumdara" == Durumdara <durumdara@gmail.com> writes:

Durumdara> I have to make a virtual table which is minute based.

Durumdara> I thought I would make a generated temp table
Durumdara> (generate_series) and then join these values based on minue.

Durumdara> 03:00 10
Durumdara> 03:01 NULL
Durumdara> 03:02 12
Durumdara> 03:03 11
Durumdara> 03:04 NULL
Durumdara> 03:05 13

Durumdara> I need a code to replace the value to the last value on
Durumdara> NULL.

It is possible to do this, even without the IGNORE NULLS option for
window functions, but it's somewhat awkward.

Someone else already mentioned using generate_series in lateral position
to fill in values; in most cases this is probably the best approach,
though it requires a little care.

There is also a window function approach based on using the non-null
values to delimit partitions:

create table tmp_test_table(mmin,val)
as select o, v
from unnest(array[1,5,NULL,3,NULL,NULL,10,7,NULL,NULL,NULL,4])
with ordinality as u(v,o);
select * from tmp_test_table order by mmin;
mmin | val
------+-----
1 | 1
2 | 5
3 |
4 | 3
5 |
6 |
7 | 10
8 | 7
9 |
10 |
11 |
12 | 4
(12 rows)

First we take advantage of the fact that mmin is increasing to generate
a distinguishing value for each block of nulls:

select *,
max(case when val is not null then mmin end)
over (order by mmin) as grp
from tmp_test_table
order by mmin;
mmin | val | grp
------+-----+-----
1 | 1 | 1
2 | 5 | 2
3 | | 2
4 | 3 | 4
5 | | 4
6 | | 4
7 | 10 | 7
8 | 7 | 8
9 | | 8
10 | | 8
11 | | 8
12 | 4 | 12
(12 rows)

Then we can fill in the missing vals by using the fact that there is
at most one non-null val in each group:

select *,
max(val) over (partition by grp) as val2
from (select *,
max(case when val is not null then mmin end)
over (order by mmin) as grp
from tmp_test_table) s
order by mmin;
mmin | val | grp | val2
------+-----+-----+------
1 | 1 | 1 | 1
2 | 5 | 2 | 5
3 | | 2 | 5
4 | 3 | 4 | 3
5 | | 4 | 3
6 | | 4 | 3
7 | 10 | 7 | 10
8 | 7 | 8 | 7
9 | | 8 | 7
10 | | 8 | 7
11 | | 8 | 7
12 | 4 | 12 | 4
(12 rows)

This _really_ isn't efficient, though; you end up with typically three
sorts of the data. For a one-off operation or for generating a
materialized view it might be acceptable.

--
Andrew (irc:RhodiumToad)

#5Kirk Wolak
wolakk@gmail.com
In reply to: Andrew Gierth (#4)
Re: Window function for get the last value to extend missing rows

On Sat, May 13, 2023 at 2:18 AM Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:

"Durumdara" == Durumdara <durumdara@gmail.com> writes:

create table tmp_test_table(mmin,val)
as select o, v
from unnest(array[1,5,NULL,3,NULL,NULL,10,7,NULL,NULL,NULL,4])
with ordinality as u(v,o);
select * from tmp_test_table order by mmin;

That seems like a lot of work.
If you have ALL the values (no missing values) a simple CTE handles this:

https://www.db-fiddle.com/f/wKyQV1imGsewR9Az7hi193/0

WITH RECURSIVE rec_cte(mmin, value) AS (
SELECT mmin, value from tmp_test_table where mmin=1
UNION ALL
SELECT t.mmin, COALESCE(t.value,r.value)
FROM tmp_test_table t, rec_cte r WHERE r.mmin=(t.mmin-1)
)
SELECT * from rec_cte order by mmin;

#6Noname
postmaster@heinz-it.de
In reply to: Durumdara (#1)
Re: Window function for get the last value to extend missing rows

Am 12.05.23 um 13:04 schrieb Durumdara:

[...]

The LAG function seems to be ok, but how to handle if more than two periods
are missing?

03:00 10
03:01 NULL
03:02 NULL
03:03 NULL
03:04 11
03:05 13

[...]

and how do you think about NULL in first(and second/third) row?