bug with aggregate + multi column index + index_scan

Started by Brian Hirtabout 20 years ago5 messagesbugs
Jump to latest
#1Brian Hirt
bhirt@mobygames.com

I've run across a rather nasty bug in 8.1.2. It seems when the
planer uses an index_scan within a GroupAggregate for a multi column
index you can get incorrect results. fwiw i also see this on a dual
xeon box running 8.1.1 and redhat 7.3.

I've created a simple test case that I hope isolates the problems
sufficiently.

x86imac:/tmp bhirt$ psql --echo-all --file=test weblogs
select version();
version
------------------------------------------------------------------------
----------------------------------------------------------------
PostgreSQL 8.1.2 on i686-apple-darwin8.4.1, compiled by GCC i686-
apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5250)
(1 row)

create table test (
id1 int4,
id2 int4,
day date,
grp text,
v int4);
CREATE TABLE
create index test_idx on test (id1,id2,day,grp);
CREATE INDEX
insert into test values (1,1,'1/1/2006','there',1);
INSERT 0 1
insert into test values (1,1,'1/2/2006','there',2);
INSERT 0 1
insert into test values (1,1,'1/3/2006','there',3);
INSERT 0 1
insert into test values (1,1,'1/1/2006','hi',2);
INSERT 0 1
insert into test values (1,1,'1/2/2006','hi',3);
INSERT 0 1
insert into test values (1,1,'1/3/2006','hi',4);
INSERT 0 1
select grp,sum(v) from test where id1 = 1 and id2 = 1 and day between
'1/1/2006' and '1/31/2006' group by grp order by sum(v) desc;
grp | sum
-------+-----
hi | 4
hi | 3
there | 3
hi | 2
there | 2
there | 1
(6 rows)

set enable_indexscan to false;
SET
select grp,sum(v) from test where id1 = 1 and id2 = 1 and day between
'1/1/2006' and '1/31/2006' group by grp order by sum(v) desc;
grp | sum
-------+-----
hi | 9
there | 6
(2 rows)

x86imac:/tmp bhirt$

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian Hirt (#1)
Re: bug with aggregate + multi column index + index_scan

Brian Hirt <bhirt@mobygames.com> writes:

I've run across a rather nasty bug in 8.1.2. It seems when the
planer uses an index_scan within a GroupAggregate for a multi column
index you can get incorrect results.

Good catch. Looks to me like it only happens if two or more leading
index columns are equated to the same constant value, ie
where id1 = 1 and id2 = 1 and ...
will show the bug but
where id1 = 1 and id2 = 2 and ...
won't. Does that match up with the original behavior that led you to
make the test case?

The problem is that implied equality deduction causes the planner to
conclude id1 = id2, and this extra bit of info is confusing the code
that determines whether the index's sort order can be considered to
match the needs of the GROUP BY clause. So you get a plan that feeds
the IndexScan directly to GroupAggregate, which is wrong because the
data isn't sorted by "grp".

In the related case
select grp,sum(v) from test where id1 = 1 and id2 = 2 and
day = '1/1/2006' group by grp order by sum(v) desc;
it *is* OK to decide that the indexscan result is effectively sorted
by "grp", so it's important to have this check ... it's just not being
done quite right. Thanks for the test case!

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian Hirt (#1)
Re: bug with aggregate + multi column index + index_scan

Brian Hirt <bhirt@mobygames.com> writes:

I've run across a rather nasty bug in 8.1.2. It seems when the
planer uses an index_scan within a GroupAggregate for a multi column
index you can get incorrect results.

Patch is here if you need it before 8.1.3:
http://archives.postgresql.org/pgsql-committers/2006-01/msg00377.php

regards, tom lane

#4Brian Hirt
bhirt@mobygames.com
In reply to: Tom Lane (#2)
Re: bug with aggregate + multi column index + index_scan

Tom,

Yes, what you describe are exactly the circumstances that are
required for our query to fail. Once again, thanks for the great
help and quick fix. Do you think this fix will make 8.1.3?

Best Regards,

Brian Hirt

On Jan 29, 2006, at 9:31 AM, Tom Lane wrote:

Show quoted text

Brian Hirt <bhirt@mobygames.com> writes:

I've run across a rather nasty bug in 8.1.2. It seems when the
planer uses an index_scan within a GroupAggregate for a multi column
index you can get incorrect results.

Good catch. Looks to me like it only happens if two or more leading
index columns are equated to the same constant value, ie
where id1 = 1 and id2 = 1 and ...
will show the bug but
where id1 = 1 and id2 = 2 and ...
won't. Does that match up with the original behavior that led you to
make the test case?

The problem is that implied equality deduction causes the planner to
conclude id1 = id2, and this extra bit of info is confusing the code
that determines whether the index's sort order can be considered to
match the needs of the GROUP BY clause. So you get a plan that feeds
the IndexScan directly to GroupAggregate, which is wrong because the
data isn't sorted by "grp".

In the related case
select grp,sum(v) from test where id1 = 1 and id2 = 2 and
day = '1/1/2006' group by grp order by sum(v) desc;
it *is* OK to decide that the indexscan result is effectively sorted
by "grp", so it's important to have this check ... it's just not being
done quite right. Thanks for the test case!

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian Hirt (#4)
Re: bug with aggregate + multi column index + index_scan

Brian Hirt <bhirt@mobygames.com> writes:

Yes, what you describe are exactly the circumstances that are
required for our query to fail. Once again, thanks for the great
help and quick fix. Do you think this fix will make 8.1.3?

It's already in CVS ...

regards, tom lane