Group By, NULL values and inconsistent behaviour.

Started by Nonamealmost 28 years ago10 messages
#1Noname
darrenk@insightdist.com

While looking thru the nodeGroup code, I noticed the following
that I'm not sure is correct.

-- Using 01-09 snapshot
create table t1 (a int4, b char(2), c char(2));
CREATE
insert into t1 (a,c) values (1,'x');
INSERT 149419 1
insert into t1 (a,c) values (2,'x');
INSERT 149420 1
insert into t1 (a,c) values (3,'z');
INSERT 149421 1
insert into t1 (a,c) values (2,'x');
INSERT 149422 1
select * from t1;
a|b|c
-+-+--
1| |x
2| |x
3| |z
2| |x
(4 rows)

select b,c,sum(a) from t1 group by b,c;
b|c |sum
-+--+---
|x | 3
|z | 3
|x | 2
(3 rows)

select b,c,sum(a) from t1 group by b,c order by c;
b|c |sum
-+--+---
|x | 3
|x | 2
|z | 3
(3 rows)

In the second query, the first two rows have been grouped, but shouldn't
they not be since b is NULL? I thought that NULL != NULL?

If so, is the third query wrong? The first two rows are different, but
only because of the aggregated column that is the source of the group by.
According to the logic from the second query, these should have been
grouped, no?

What does the standard say about comparing two NULL values?

The fixes for these inconsistencies appear to be simple. To cause a new
group to be started if NULL != NULL, simply change the "continue;" in the
sameGroup function in nodeGroup.c to "return FALSE;" Ignoring aggregated
columns would also then be added to sameGroup().

darrenk

#2Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#1)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

Where are we on this. It appears this NULL group by is seriously
broken.

Can we have some tests on commercial databases, and get a patch
generated?

While looking thru the nodeGroup code, I noticed the following
that I'm not sure is correct.

-- Using 01-09 snapshot
create table t1 (a int4, b char(2), c char(2));
CREATE
insert into t1 (a,c) values (1,'x');
INSERT 149419 1
insert into t1 (a,c) values (2,'x');
INSERT 149420 1
insert into t1 (a,c) values (3,'z');
INSERT 149421 1
insert into t1 (a,c) values (2,'x');
INSERT 149422 1
select * from t1;
a|b|c
-+-+--
1| |x
2| |x
3| |z
2| |x
(4 rows)

select b,c,sum(a) from t1 group by b,c;
b|c |sum
-+--+---
|x | 3
|z | 3
|x | 2
(3 rows)

select b,c,sum(a) from t1 group by b,c order by c;
b|c |sum
-+--+---
|x | 3
|x | 2
|z | 3
(3 rows)

In the second query, the first two rows have been grouped, but shouldn't
they not be since b is NULL? I thought that NULL != NULL?

If so, is the third query wrong? The first two rows are different, but
only because of the aggregated column that is the source of the group by.
According to the logic from the second query, these should have been
grouped, no?

What does the standard say about comparing two NULL values?

The fixes for these inconsistencies appear to be simple. To cause a new
group to be started if NULL != NULL, simply change the "continue;" in the
sameGroup function in nodeGroup.c to "return FALSE;" Ignoring aggregated
columns would also then be added to sameGroup().

darrenk

--
Bruce Momjian
maillist@candle.pha.pa.us

#3Zeugswetter Andreas DBT
Andreas.Zeugswetter@telecom.at
In reply to: Bruce Momjian (#2)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

The following is Informix behavior:

informix@zeus:/usr/informix72> dbaccess - -

create database nulltest in datadbs;

Database created.

create table t1 (a int4, b char(2), c char(2));

201: A syntax error has occurred.
Error in line 1
Near character position 20

create table t1 (a int, b char(2), c char(2));

Table created.

insert into t1 (a,c) values (1,'x');

1 row(s) inserted.

insert into t1 (a,c) values (2,'x');

1 row(s) inserted.

insert into t1 (a,c) values (3,'z');

1 row(s) inserted.

insert into t1 (a,c) values (2,'x');

1 row(s) inserted.

select * from t1;

a b c
1 x
2 x
3 z
2 x

4 row(s) retrieved.

select b,c,sum(a) from t1 group by b,c;

b c (sum)

x 5
z 3

2 row(s) retrieved.

select b,c,sum(a) from t1 group by b,c order by c;

b c (sum)

x 5
z 3

2 row(s) retrieved.

Andreas

#4Zeugswetter Andreas DBT
Andreas.Zeugswetter@telecom.at
In reply to: Zeugswetter Andreas DBT (#3)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

In the second query, the first two rows have been grouped, but

shouldn't

they not be since b is NULL? I thought that NULL != NULL?

Note that:
NULL <> NULL is false
NULL = NULL is false

select * from t1 x, t1 y where x.b <> y.b;

a b c a b c
No rows found.

select * from t1 x, t1 y where x.b = y.b;

a b c a b c
No rows found.

select * from t1 x, t1 y where not x.b = y.b;

a b c a b c
No rows found.

select * from t1 x, t1 y where not x.b <> y.b;

a b c a b c
No rows found.

select * from t1 where a = b;

a b c
No rows found.

select * from t1 where a <> b;

a b c
No rows found.

The false seems not to be commutative.
Feel free to ask for more
Andreas

#5Noname
darrenk@insightdist.com
In reply to: Zeugswetter Andreas DBT (#4)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

The following is Informix behavior:

informix@zeus:/usr/informix72> dbaccess - -

create database nulltest in datadbs;

Database created.

create table t1 (a int, b char(2), c char(2));

Table created.

insert into t1 (a,c) values (1,'x');

1 row(s) inserted.

insert into t1 (a,c) values (2,'x');

1 row(s) inserted.

insert into t1 (a,c) values (3,'z');

1 row(s) inserted.

insert into t1 (a,c) values (2,'x');

1 row(s) inserted.

select * from t1;

a b c
1 x
2 x
3 z
2 x

4 row(s) retrieved.

select b,c,sum(a) from t1 group by b,c;

b c (sum)

x 5
z 3

2 row(s) retrieved.

Here is where postgres seems to differ. Seems postgres is missing
an implicit sort so that the grouping is done properly.

Postgres will return _three_ rows...

b c (sum)
x 3
z 3
x 2

select b,c,sum(a) from t1 group by b,c order by c;

b c (sum)

x 5
z 3

2 row(s) retrieved.

Even with the order by, postgres still returns _three_ rows...

b c (sum)
x 3
x 2
z 3

For now, ignore the patch I sent. Appears from Andreas demo that the
current postgres code will follow the Informix style with regard to
grouping columns with NULL values. Now that I really think about it,
it does make more sense.

But there is still a problem.

Does the SQL standard say anything about an implied sort when
grouping or is it up to the user to include an ORDER BY clause?

darrenk

#6Michael J. Maravillo
mmj@philonline.com
In reply to: Noname (#5)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

On Mon, 26 Jan 1998, Darren King wrote:

The following is Informix behavior:

select b,c,sum(a) from t1 group by b,c;

b c (sum)

x 5
z 3

2 row(s) retrieved.

Here is where postgres seems to differ. Seems postgres is missing
an implicit sort so that the grouping is done properly.

Postgres will return _three_ rows...

b c (sum)
x 3
z 3
x 2

I'm running the current cvs and it gives me this.

select b,c,sum(a) from t1 group by b,c;
b|c |sum
-+--+---
|x | 1
|x | 2
|z | 3
|x | 2
(4 rows)

select b,c,sum(a) from t1 group by b,c order by c;

b c (sum)

x 5
z 3

2 row(s) retrieved.

Even with the order by, postgres still returns _three_ rows...

b c (sum)
x 3
x 2
z 3

select b,c,sum(a) from t1 group by b,c order by c;
b|c |sum
-+--+---
|x | 1
|x | 2
|x | 2
|z | 3
(4 rows)

For now, ignore the patch I sent. Appears from Andreas demo that the
current postgres code will follow the Informix style with regard to
grouping columns with NULL values. Now that I really think about it,
it does make more sense.

I think I saw the patch committed this morning...?

Mike
[ Michael J. Maravillo Philippines Online ]
[ System Administrator PGP KeyID: 470AED9D InfoDyne, Incorporated ]
[ http://www.philonline.com/~mmj/ (632) 890-0204 ]

#7The Hermit Hacker
scrappy@hub.org
In reply to: Michael J. Maravillo (#6)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

On Tue, 27 Jan 1998, Michael J. Maravillo wrote:

For now, ignore the patch I sent. Appears from Andreas demo that the
current postgres code will follow the Informix style with regard to
grouping columns with NULL values. Now that I really think about it,
it does make more sense.

I think I saw the patch committed this morning...?

Yesterday evening, actually...should we back it out, or leave it
as is? Is the old way more corrrect the the new?

#8Noname
ocie@paracel.com
In reply to: Bruce Momjian (#2)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

Bruce Momjian wrote:

Where are we on this. It appears this NULL group by is seriously
broken.

Can we have some tests on commercial databases, and get a patch
generated?

I ran the test on Sybase. The only real changes were int4->int and
explicitly calling out field b as null (it defaults to not null).

1> select @@version
2> go

-----------------------------------------------------------------------------
SQL Server/11.0.2/P/Sun_svr4/OS 5.4/EBF 6536/OPT/Sat Aug 17 11:54:59 PDT 1996

(1 row affected)
1> create table t1 (a int, b char(2) null, c char(2))
2> go
1> insert into t1 (a,c) values (1,'x')
2> go
(1 row affected)
1> insert into t1 (a,c) values (2,'x')
2> go
(1 row affected)
1> insert into t1 (a,c) values (3,'z')
2> go
(1 row affected)
1> insert into t1 (a,c) values (2,'x')
2> go
(1 row affected)
1> select * from t1
2> go
a b c
----------- -- --
1 NULL x
2 NULL x
3 NULL z
2 NULL x

(4 rows affected)
1> select b,c,sum(a) from t1 group by b,c
2> go
b c
-- -- -----------
NULL x 5
NULL z 3

(2 rows affected)
1> select b,c,sum(a) from t1 group by b,c order by c
2> go
b c
-- -- -----------
NULL x 5
NULL z 3

(2 rows affected)

It seems that Sybase thinks a null is a null in this case. However,
try the following:

select * from t1 x, t1 y where x.b=y.b and y.c='z';

Sybase returns zero rows for this. It seems that it treats NULLs as
equal for order and group operations, but not for join operations.

Ocie Mitchell

#9Noname
darrenk@insightdist.com
In reply to: Noname (#8)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

For now, ignore the patch I sent. Appears from Andreas demo that the
current postgres code will follow the Informix style with regard to
grouping columns with NULL values. Now that I really think about it,
it does make more sense.

I think I saw the patch committed this morning...?

Yesterday evening, actually...should we back it out, or leave it
as is? Is the old way more corrrect the the new?

Marc,

From the two responses demonstrating Informix and Sybase, I think the patch

I sent should be backed out. I can live with NULL equaling NULL in the
GROUP BY and ORDER BY clauses, but not in the WHERE clause, if everyone else
is doing it that way.

darrenk

#10Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#9)
Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

For now, ignore the patch I sent. Appears from Andreas demo that the
current postgres code will follow the Informix style with regard to
grouping columns with NULL values. Now that I really think about it,
it does make more sense.

I think I saw the patch committed this morning...?

Yesterday evening, actually...should we back it out, or leave it
as is? Is the old way more corrrect the the new?

Marc,

From the two responses demonstrating Informix and Sybase, I think the patch

I sent should be backed out. I can live with NULL equaling NULL in the
GROUP BY and ORDER BY clauses, but not in the WHERE clause, if everyone else
is doing it that way.

darrenk

Your patch backed out.

--
Bruce Momjian
maillist@candle.pha.pa.us