Re: [SQL] What's wrong with this group by clause?

Started by Manfred Koizaralmost 23 years ago4 messages
#1Manfred Koizar
mkoi-pg@aon.at

[forwarding to -hackers]

On Tue, 4 Nov 2003 18:28:12 -0300, Franco Bruno Borghesi
<franco@akyasociados.com.ar> wrote:

Below you can find a simplified example of a real case.
I don't understand why I'm getting the "john" record twice.

ISTM you have found a Postgres 7.3 bug.

I get one john with
PostgreSQL 7.1.3 on i686-pc-cygwin, compiled by GCC 2.95.3-5
and
PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1

but two johns with
PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1

/*EXAMPLE*/
CREATE TABLE people
(
name TEXT
);
INSERT INTO people VALUES ('john');
INSERT INTO people VALUES ('john');
INSERT INTO people VALUES ('pete');
INSERT INTO people VALUES ('pete');
INSERT INTO people VALUES ('ernest');
INSERT INTO people VALUES ('john');

SELECT
0 AS field1,
0 AS field2,
name
FROM
people
GROUP BY
field1,
field2,
name;

field1 | field2 | name
--------+--------+--------
0 | 0 | john
0 | 0 | pete
0 | 0 | ernest
0 | 0 | john
(4 rows)

Same for
SELECT 0 AS field1, 0 AS field2, name
FROM people
GROUP BY 1, 2, name;

Servus
Manfred

#2Christoph Haller
ch@rodos.fzk.de
In reply to: Manfred Koizar (#1)

On Tue, 4 Nov 2003 18:28:12 -0300, Franco Bruno Borghesi
<franco@akyasociados.com.ar> wrote:

Below you can find a simplified example of a real case.
I don't understand why I'm getting the "john" record twice.

ISTM you have found a Postgres 7.3 bug.

I get one john with
PostgreSQL 7.1.3 on i686-pc-cygwin, compiled by GCC 2.95.3-5
and
PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1

but two johns with
PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1

/*EXAMPLE*/
CREATE TABLE people
(
name TEXT
);
INSERT INTO people VALUES ('john');
INSERT INTO people VALUES ('john');
INSERT INTO people VALUES ('pete');
INSERT INTO people VALUES ('pete');
INSERT INTO people VALUES ('ernest');
INSERT INTO people VALUES ('john');

SELECT
0 AS field1,
0 AS field2,
name
FROM
people
GROUP BY
field1,
field2,
name;

field1 | field2 | name
--------+--------+--------
0 | 0 | john
0 | 0 | pete
0 | 0 | ernest
0 | 0 | john
(4 rows)

PostgreSQL 7.2.3 on hppa-hp-hpux10.20, compiled by GCC 2.95.2

SELECT 0 AS field1, 0 AS field2, name FROM people GROUP BY
field1, field2, name;
field1 | field2 | name
--------+--------+--------
0 | 0 | ernest
0 | 0 | john
0 | 0 | pete
(3 rows)

PostgreSQL 7.3.2 on hppa-hp-hpux10.20, compiled by GCC 2.95.2

SELECT 0 AS field1, 0 AS field2, name FROM people GROUP BY
field1, field2, name;
field1 | field2 | name
--------+--------+--------
0 | 0 | john
0 | 0 | pete
0 | 0 | john
0 | 0 | pete
0 | 0 | john
0 | 0 | ernest
(6 rows)

I doubt this is a bug in 7.3.2 but in prior versions.
I've cross-checked how another DBMS (HP's ALLBASE) handles GROUP BY
without an aggregate, and it acts like 7.3.2.

Regards, Christoph

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Manfred Koizar (#1)

Manfred Koizar <mkoi-pg@aon.at> writes:

ISTM you have found a Postgres 7.3 bug.

Yeah. Actually, the planner bug has been there a long time, but it was
only latent until the parser stopped suppressing duplicate GROUP BY
items:

2002-08-18 14:46 tgl

* src/backend/parser/parse_clause.c: Remove optimization whereby
parser would make only one sort-list entry when two equal()
targetlist items were to be added to an ORDER BY or DISTINCT list.
Although indeed this would make sorting fractionally faster by
sometimes saving a comparison, it confuses the heck out of later
stages of processing, because it makes it look like the user wrote
DISTINCT ON rather than DISTINCT. Bug reported by
joe@piscitella.com.

7.3 patch is attached if you need it.

regards, tom lane

*** src/backend/optimizer/plan/planner.c.orig	Wed Mar  5 13:38:26 2003
--- src/backend/optimizer/plan/planner.c	Thu Mar 13 11:21:16 2003
***************
*** 1498,1510 ****
  		 * are just dummies with no extra execution cost.)
  		 */
  		List	   *sort_tlist = new_unsorted_tlist(subplan->targetlist);
  		int			keyno = 0;
  		List	   *gl;

foreach(gl, groupClause)
{
GroupClause *grpcl = (GroupClause *) lfirst(gl);
! TargetEntry *te = nth(grpColIdx[keyno] - 1, sort_tlist);
Resdom *resdom = te->resdom;

  			/*
--- 1498,1511 ----
  		 * are just dummies with no extra execution cost.)
  		 */
  		List	   *sort_tlist = new_unsorted_tlist(subplan->targetlist);
+ 		int			grpno = 0;
  		int			keyno = 0;
  		List	   *gl;

foreach(gl, groupClause)
{
GroupClause *grpcl = (GroupClause *) lfirst(gl);
! TargetEntry *te = nth(grpColIdx[grpno] - 1, sort_tlist);
Resdom *resdom = te->resdom;

  			/*
***************
*** 1518,1523 ****
--- 1519,1525 ----
  				resdom->reskey = ++keyno;
  				resdom->reskeyop = grpcl->sortop;
  			}
+ 			grpno++;
  		}

Assert(keyno > 0);

#4Franco Bruno Borghesi
franco@akyasociados.com.ar
In reply to: Tom Lane (#3)
Re: [HACKERS] What's wrong with this group by clause?

Thanks Tom, I applied the patch and it works perfect now.
Thanks to you all.

Show quoted text

On Thursday 13 March 2003 14:02, Tom Lane wrote:

Manfred Koizar <mkoi-pg@aon.at> writes:

ISTM you have found a Postgres 7.3 bug.

Yeah. Actually, the planner bug has been there a long time, but it was
only latent until the parser stopped suppressing duplicate GROUP BY
items:

2002-08-18 14:46 tgl

* src/backend/parser/parse_clause.c: Remove optimization whereby
parser would make only one sort-list entry when two equal()
targetlist items were to be added to an ORDER BY or DISTINCT list.
Although indeed this would make sorting fractionally faster by
sometimes saving a comparison, it confuses the heck out of later
stages of processing, because it makes it look like the user wrote
DISTINCT ON rather than DISTINCT. Bug reported by
joe@piscitella.com.

7.3 patch is attached if you need it.

regards, tom lane

*** src/backend/optimizer/plan/planner.c.orig	Wed Mar  5 13:38:26 2003
--- src/backend/optimizer/plan/planner.c	Thu Mar 13 11:21:16 2003
***************
*** 1498,1510 ****
* are just dummies with no extra execution cost.)
*/
List	   *sort_tlist = new_unsorted_tlist(subplan->targetlist);
int			keyno = 0;
List	   *gl;

foreach(gl, groupClause)
{
GroupClause *grpcl = (GroupClause *) lfirst(gl);
! TargetEntry *te = nth(grpColIdx[keyno] - 1, sort_tlist);
Resdom *resdom = te->resdom;

/*
--- 1498,1511 ----
* are just dummies with no extra execution cost.)
*/
List	   *sort_tlist = new_unsorted_tlist(subplan->targetlist);
+ 		int			grpno = 0;
int			keyno = 0;
List	   *gl;

foreach(gl, groupClause)
{
GroupClause *grpcl = (GroupClause *) lfirst(gl);
! TargetEntry *te = nth(grpColIdx[grpno] - 1, sort_tlist);
Resdom *resdom = te->resdom;

/*
***************
*** 1518,1523 ****
--- 1519,1525 ----
resdom->reskey = ++keyno;
resdom->reskeyop = grpcl->sortop;
}
+ 			grpno++;
}

Assert(keyno > 0);