HAVING push-down

Started by Simon Riggsalmost 19 years ago7 messages

simon@2ndquadrant.com

almost 19 years ago

I've just read a paper that says PostgreSQL doesn't do this. My reading
of the code is that we *do* evaluate the HAVING clause prior to
calculating the aggregates for it. I thought I'd check to resolve the
confusion.

- - -

If not, it seems fairly straightforward to push down some or all of a
HAVING clause so that the qual clause is tested prior to aggregation,
not after aggregation. This could, for certain queries, significantly
reduce the amount of effort that the final Agg node performs.

We might think about deeper push-down within the query, but since the
Agg node already has the havingQual, it seems a straightforward act to
decide whether to apply it before or after the aggregation.

We already do find_unaggregated_cols(), so little additional analysis
seems required.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Gregory Stark

stark@enterprisedb.com

almost 19 years ago

In reply to: Simon Riggs (#1)

Re: HAVING push-down

"Simon Riggs" <simon@2ndquadrant.com> writes:

I've just read a paper that says PostgreSQL doesn't do this. My reading
of the code is that we *do* evaluate the HAVING clause prior to
calculating the aggregates for it. I thought I'd check to resolve the
confusion.

- - -

If not, it seems fairly straightforward to push down some or all of a
HAVING clause so that the qual clause is tested prior to aggregation,
not after aggregation. This could, for certain queries, significantly
reduce the amount of effort that the final Agg node performs.

You mean in cases like this?

postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..13.61 rows=1 width=12)
-> Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12)
Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
(3 rows)

I think we push having clauses into WHERE clauses whenever there are no
aggregates in them.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndquadrant.com

almost 19 years ago

In reply to: Gregory Stark (#2)

Re: HAVING push-down

On Fri, 2007-01-26 at 15:22 +0000, Gregory Stark wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

I've just read a paper that says PostgreSQL doesn't do this. My reading
of the code is that we *do* evaluate the HAVING clause prior to
calculating the aggregates for it. I thought I'd check to resolve the
confusion.

You mean in cases like this?

postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..13.61 rows=1 width=12)
-> Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12)
Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
(3 rows)

OK, thanks. I'll feedback to the author of the paper I was reviewing.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Joshua D. Drake

jd@commandprompt.com

almost 19 years ago

In reply to: Simon Riggs (#3)

Re: HAVING push-down

Simon Riggs wrote:

On Fri, 2007-01-26 at 15:22 +0000, Gregory Stark wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

I've just read a paper that says PostgreSQL doesn't do this. My reading
of the code is that we *do* evaluate the HAVING clause prior to
calculating the aggregates for it. I thought I'd check to resolve the
confusion.

You mean in cases like this?

postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..13.61 rows=1 width=12)
-> Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12)
Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
(3 rows)

OK, thanks. I'll feedback to the author of the paper I was reviewing.

Care to share the paper in general? It might be beneficial for all of us.

Joshua D. Drake

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Simon Riggs (#1)

Re: HAVING push-down

"Simon Riggs" <simon@2ndquadrant.com> writes:

I've just read a paper that says PostgreSQL doesn't do this.

What does he mean by that exactly, and which PG version is he looking
at? As Greg notes, we do know how to push down non-aggregated
conditions, but I'm not sure that's what he's thinking of. There have
been some relevant bug fixes, eg

2004-07-10 14:39 tgl

* src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
HAVING condition before computing targetlist of an Aggregate node.
This is required by SQL spec to avoid failures in cases like
SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) >
0; AFAICT we have gotten this wrong since day one. Kudos to Holger
Jakobs for being the first to notice.

Also, it's still true that we run all the aggregate transition functions
in parallel, so if you were hoping to use HAVING on an aggregate
condition to prevent an overflow or something in the state accumulation
function for a targetlist aggregate, you'd lose. But I don't see any
way to avoid that without scanning the data twice, which we're surely
not gonna do.

regards, tom lane

Simon Riggs

simon@2ndquadrant.com

almost 19 years ago

In reply to: Tom Lane (#5)

Re: HAVING push-down

On Fri, 2007-01-26 at 11:16 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

I've just read a paper that says PostgreSQL doesn't do this.

What does he mean by that exactly, and which PG version is he looking
at? As Greg notes, we do know how to push down non-aggregated
conditions, but I'm not sure that's what he's thinking of.

Yes, it was specifically non-aggregated conditions.

There have
been some relevant bug fixes, eg

2004-07-10 14:39 tgl

* src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
HAVING condition before computing targetlist of an Aggregate node.
This is required by SQL spec to avoid failures in cases like
SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) >
0; AFAICT we have gotten this wrong since day one. Kudos to Holger
Jakobs for being the first to notice.

Also, it's still true that we run all the aggregate transition functions
in parallel, so if you were hoping to use HAVING on an aggregate
condition to prevent an overflow or something in the state accumulation
function for a targetlist aggregate, you'd lose. But I don't see any
way to avoid that without scanning the data twice, which we're surely
not gonna do.

I'll send you the paper off-line, there's some more interesting stuff
also. p.12

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndquadrant.com

almost 19 years ago

In reply to: Joshua D. Drake (#4)

Re: HAVING push-down

On Fri, 2007-01-26 at 07:46 -0800, Joshua D. Drake wrote:

Care to share the paper in general? It might be beneficial for all of us.

I'll ask the author, but don't expect an immediate response.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com