Questionable result from lead(0) IGNORE NULLS
Hi Oliver,
I noticed a questionable result from "lead(0) IGNORE NULLS".
CREATE TEMP TABLE g(x INT, y INT);
CREATE TABLE
INSERT INTO g (VALUES(NULL,1),(NULL,2),(1,3));
INSERT 0 3
SELECT * FROM g;
x | y
---+---
| 1
| 2
1 | 3
(3 rows)
SELECT x, y, lead(x, 0) RESPECT NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 | 1
(3 rows)
SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 | 1
(3 rows)
As you can see, "lead(x, 0) IGNORE NULLS" shows the same result as
"lead(x, 0) RESPECT NULLS". IMO "lead(x, 0) IGNORE NULLS" should show
something like:
x | y | lead
---+---+------
| 1 | 1
| 2 | 1
1 | 3 | 1
(3 rows)
The same thing can be said to lag().
Looking into the code, in
WinGetFuncArgInPartition(src/backend/executor/nodeWindowAgg.c) I see
this:
if (winobj->ignore_nulls == IGNORE_NULLS && relpos != 0)
{
null_treatment = true;
Here, if the caller is lead(0), then relpos == 0, thus
"null_treatment" is not set to true and falls into the code later on:
if (!null_treatment) /* IGNORE NULLS is not specified */
{
/* get tupple and evaluate in a partition */
datum = gettuple_eval_partition(winobj, argno,
and runs through the same code path as RESPECT NULLS. I think this is
the reason why "lead(0, x) IGNORE NULLS" showed the same result as
"lead(0, x) RESPECT NULLS". "relpos != 0" part was originally in your
patch. Oliver, what's the reason why you excluded relpose==0 case? Can
we eliminate the restriction and let "lead(0) IGNORE NULLS" case run
the same code path as relpos!=0 (of course with proper adjustment in
related code)?
Best regards,
[1]: /messages/by-id/CAGMVOduHcfhh7Wo9W1Tff0DH_ccPuQGc8D_f5S2_y4OHFOjn=A@mail.gmail.com -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Tue, Oct 7, 2025 at 8:41 AM Tatsuo Ishii <ishii@postgresql.org> wrote:
As you can see, "lead(x, 0) IGNORE NULLS" shows the same result as
"lead(x, 0) RESPECT NULLS". IMO "lead(x, 0) IGNORE NULLS" should show
something like:x | y | lead
---+---+------
| 1 | 1
| 2 | 1
1 | 3 | 1
(3 rows)Looking into the code, in
WinGetFuncArgInPartition(src/backend/executor/nodeWindowAgg.c) I see
this:
if (winobj->ignore_nulls == IGNORE_NULLS && relpos != 0)
{
null_treatment = true;Here, if the caller is lead(0), then relpos == 0, thus
"null_treatment" is not set to true and falls into the code later on:if (!null_treatment) /* IGNORE NULLS is not specified */
{
/* get tupple and evaluate in a partition */
datum = gettuple_eval_partition(winobj, argno,and runs through the same code path as RESPECT NULLS. I think this is
the reason why "lead(0, x) IGNORE NULLS" showed the same result as
"lead(0, x) RESPECT NULLS". "relpos != 0" part was originally in your
patch. Oliver, what's the reason why you excluded relpose==0 case? Can
we eliminate the restriction and let "lead(0) IGNORE NULLS" case run
the same code path as relpos!=0 (of course with proper adjustment in
related code)?
The result looks wrong. So I've just tried removing the "&& relpos != 0"
and I get:
SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 |
(3 rows)
Nothing appears for lead at all. So it was doing something but doesn't look
like it handles the lead(x, 0) case, but it does handle lead(x) - which is
the same as lead(x, 1):
SELECT x, y, lead(x) IGNORE NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 | 1
| 2 | 1
1 | 3 |
(3 rows)
Without Ignore Nulls, lead(x,0) is just the row itself so you'd never use
that function. But yes this case needs to be handled, I'll look through the
code again, handle this for lead/lag, and add tests.
Hi Oliver,
I have just pushed a change to WinGetFuncArgInPartition() in
nodeWindowAgg.c to fix Coverity issues. So please update your git
respository.
The result looks wrong. So I've just tried removing the "&& relpos != 0"
and I get:SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 |
(3 rows)Nothing appears for lead at all. So it was doing something but doesn't look
like it handles the lead(x, 0) case
I think we need to change this:
forward = relpos > 0 ? 1 : -1;
:
:
/*
* Get the next nonnull value in the partition, moving forward or backward
* until we find a value or reach the partition's end.
*/
do
{
int nn_info; /* NOT NULL info */
abs_pos += forward;
if (abs_pos < 0) /* apparently out of partition */
break;
In lead(0, x) case, abs_pos==0 and foward==-1. So it exits the loop
due to out of partition. Probably we need to change
forward = relpos > 0 ? 1 : -1;
to
forward = relpos >= 0 ? 1 : -1;
and change the do..while loop to a for loop?
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Hi Oliver,
Nothing appears for lead at all. So it was doing something but doesn't look
like it handles the lead(x, 0) caseI think we need to change this:
forward = relpos > 0 ? 1 : -1;
:
:
/*
* Get the next nonnull value in the partition, moving forward or backward
* until we find a value or reach the partition's end.
*/
do
{
int nn_info; /* NOT NULL info */abs_pos += forward;
if (abs_pos < 0) /* apparently out of partition */
break;In lead(0, x) case, abs_pos==0 and foward==-1. So it exits the loop
due to out of partition. Probably we need to change
forward = relpos > 0 ? 1 : -1;
to
forward = relpos >= 0 ? 1 : -1;
and change the do..while loop to a for loop?
Attached patch is written in this direction. What do you think?
Below are the results. IMO now lead() returns correct results.
psql -a -f window.sql test
CREATE TEMP TABLE g(x INT, y INT);
CREATE TABLE
INSERT INTO g (VALUES(NULL,1),(NULL,2),(1,3));
INSERT 0 3
SELECT * FROM g;
x | y
---+---
| 1
| 2
1 | 3
(3 rows)
SELECT x, y, lead(x, 0) RESPECT NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 | 1
(3 rows)
SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 | 1
| 2 | 1
1 | 3 | 1
(3 rows)
SELECT x, y, lead(x, 1) IGNORE NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 |
(3 rows)
While working on this, I found some of window function regression
tests using lead/lag are not quite correct. Below is some of them.
-- lead
SELECT name,
orbit,
lead(orbit) OVER w AS lead,
lead(orbit) RESPECT NULLS OVER w AS lead_respect,
lead(orbit) IGNORE NULLS OVER w AS lead_ignore
FROM planets
WINDOW w AS (ORDER BY name)
;
name | orbit | lead | lead_respect | lead_ignore
---------+-------+-------+--------------+-------------
earth | | 4332 | 4332 | 4332
jupiter | 4332 | | | 88
mars | | 88 | 88 | 88
mercury | 88 | 60182 | 60182 | 60182
neptune | 60182 | 90560 | 90560 | 90560
pluto | 90560 | 24491 | 24491 | 24491
saturn | 24491 | | | 224
uranus | | 224 | 224 | 224
venus | 224 | | |
xyzzy | | | |
(10 rows)
Why lead_ignore shows "4332" on the first row? Since "orbit"'s second
non null row is orbit==88, I think lead(orbit) should return 88,
rather than 4332 if my understanding of the SQL standard is correct.
IMO the right result is as below, which is actually the one after
applying the patch.
name | orbit | lead | lead_respect | lead_ignore
---------+-------+-------+--------------+-------------
earth | | 4332 | 4332 | 88
jupiter | 4332 | | | 88
mars | | 88 | 88 | 60182
mercury | 88 | 60182 | 60182 | 60182
neptune | 60182 | 90560 | 90560 | 90560
pluto | 90560 | 24491 | 24491 | 24491
saturn | 24491 | | | 224
uranus | | 224 | 224 |
venus | 224 | | |
xyzzy | | | |
(10 rows)
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachments:
v1-0001-Fix-window-function-lead-lag-incorrect-results.patchapplication/octet-streamDownload
From 3b633b071c7eb7445553761440b3caf8157af1d7 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <ishii@postgresql.org>
Date: Wed, 8 Oct 2025 13:26:57 +0900
Subject: [PATCH v1] Fix window function lead/lag incorrect results.
Window function lead/lag returned incorrect results if IGNORE NULLS
option is specified. Also this commit fixes if "offse=0" is given to
lead/lag, they returned incorrect results.
---
src/backend/executor/nodeWindowAgg.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 0698aae37a7..f9a9700dfa8 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -3727,12 +3727,13 @@ WinGetFuncArgInPartition(WindowObject winobj, int argno,
Assert(WindowObjectIsValid(winobj));
winstate = winobj->winstate;
- if (winobj->ignore_nulls == IGNORE_NULLS && relpos != 0)
+ /* prepare thing if IGNORE NULLS */
+ if (winobj->ignore_nulls == IGNORE_NULLS)
{
- null_treatment = true;
- notnull_offset = 0;
- notnull_relpos = abs(relpos);
- forward = relpos > 0 ? 1 : -1;
+ null_treatment = true; /* we need to consider IGNORE NULLS */
+ notnull_offset = 0; /* last NOT NULL row offset */
+ notnull_relpos = abs(relpos); /* NOT NULL row absolute pos */
+ forward = relpos >= 0 ? 1 : -1; /* 1: move forward, -1: otherwise */
}
switch (seektype)
@@ -3778,18 +3779,18 @@ WinGetFuncArgInPartition(WindowObject winobj, int argno,
* Get the next nonnull value in the partition, moving forward or backward
* until we find a value or reach the partition's end.
*/
- do
+ for (; abs_pos >= 0; abs_pos += forward)
{
int nn_info; /* NOT NULL info */
- abs_pos += forward;
- if (abs_pos < 0) /* apparently out of partition */
- break;
-
/* check NOT NULL cached info */
nn_info = get_notnull_info(winobj, abs_pos);
if (nn_info == NN_NOTNULL) /* this row is known to be NOT NULL */
+ {
notnull_offset++;
+ if (notnull_offset > notnull_relpos)
+ break; /* target row found */
+ }
else if (nn_info == NN_NULL) /* this row is known to be NULL */
continue; /* keep on moving forward or backward */
@@ -3802,11 +3803,15 @@ WinGetFuncArgInPartition(WindowObject winobj, int argno,
if (myisout) /* out of partition? */
break;
if (!*isnull)
+ {
notnull_offset++;
+ if (notnull_offset > notnull_relpos)
+ break; /* target row found */
+ }
/* record the row status */
put_notnull_info(winobj, abs_pos, *isnull);
}
- } while (notnull_offset < notnull_relpos);
+ }
/* get tupple and evaluate in a partition */
datum = gettuple_eval_partition(winobj, argno,
--
2.43.0
Hi Oliver,
After studying the standard more, it seems I was totally wrong.
In summary, current code is correct. Sorry for noise.
From the standard explaining lead():
B) If OFFSET = 0 (zero), then the value of <window function> is the
value of VE1 evaluated for the current row.
(here VE1 referes to the first argument of lead()).
So it seems we need to treat offset==0 case specially. i.e. eval VE1
and do not through away even if the result is NULL.
I noticed a questionable result from "lead(0) IGNORE NULLS".
CREATE TEMP TABLE g(x INT, y INT);
CREATE TABLE
INSERT INTO g (VALUES(NULL,1),(NULL,2),(1,3));
INSERT 0 3
SELECT * FROM g;
x | y
---+---
| 1
| 2
1 | 3
(3 rows)SELECT x, y, lead(x, 0) RESPECT NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 | 1
(3 rows)SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g
WINDOW w AS (ORDER BY y);
x | y | lead
---+---+------
| 1 |
| 2 |
1 | 3 | 1
(3 rows)As you can see, "lead(x, 0) IGNORE NULLS" shows the same result as
"lead(x, 0) RESPECT NULLS". IMO "lead(x, 0) IGNORE NULLS" should show
something like:
No. The result above is perfectly correct.
if (winobj->ignore_nulls == IGNORE_NULLS && relpos != 0)
Now I understand why "relpos != 0" part is necessary.
Also, my email below was wrong.
While working on this, I found some of window function regression
tests using lead/lag are not quite correct. Below is some of them.-- lead
SELECT name,
orbit,
lead(orbit) OVER w AS lead,
lead(orbit) RESPECT NULLS OVER w AS lead_respect,
lead(orbit) IGNORE NULLS OVER w AS lead_ignore
FROM planets
WINDOW w AS (ORDER BY name)
;
name | orbit | lead | lead_respect | lead_ignore
---------+-------+-------+--------------+-------------
earth | | 4332 | 4332 | 4332
jupiter | 4332 | | | 88
mars | | 88 | 88 | 88
mercury | 88 | 60182 | 60182 | 60182
neptune | 60182 | 90560 | 90560 | 90560
pluto | 90560 | 24491 | 24491 | 24491
saturn | 24491 | | | 224
uranus | | 224 | 224 | 224
venus | 224 | | |
xyzzy | | | |
(10 rows)Why lead_ignore shows "4332" on the first row? Since "orbit"'s second
non null row is orbit==88, I think lead(orbit) should return 88,
rather than 4332 if my understanding of the SQL standard is correct.
According to the standard if OFFSET is not 0, then:
B) Otherwise, let TX be the sequence of values that is the result of applying VE1 to each
row of T that follows the current row and eliminating null values, ordered according
to the window ordering of WDX.
C) Otherwise, the value of <window function> is the m-th value of
TX, where m = OFFSET.
Thus TX does not include current row and for the first row, and
lead(orbit) returns the second row, that is orbit==4332 because m==1.
So, current regression test expected file is correct.
Again, sorry for noise.
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp