== PostgreSQL Weekly News - August 27 2005 ==

Started by David Fetteralmost 21 years ago8 messageshackers

david@fetter.org

almost 21 years ago

== PostgreSQL Weekly News - August 27 2005 ==

PostgreSQL 8.1 beta1 is out. Congratulations to all who have
participated in making this happen.
http://www.postgresql.org/developer/beta

Please test the new beta. Some of the new features are at
http://developer.postgresql.org/docs/postgres/release.html#RELEASE-8-1

Congratulations to Dave Page, the newest addition to Core. Dave has
headed the pgODBC project for a couple of years, started the pgAdmin
project in 1998, has been our lead webmaster for three years, and is
now sharing responsibility for Win32 packaging with Magnus Hagander.
In these latter two roles, Dave's help is indispensable to the
PostgreSQL release process, so he now gets the title for the work he
is already doing.

Andrew Dunstan and Alvaro Herrera are now committers. Kudos!

== PostgreSQL Product News ==

Oryx 0.97 is out
http://www.oryx.com/mailstore/0.97.html

== PostgreSQL in the News ==

General Bits: http://www.varlena.com/GeneralBits/
Overlapping Revisions

PostgreSQL Weekly News is brought to you this week by David Fetter and
Josh Berkus

Simon Riggs

simon@2ndQuadrant.com

almost 21 years ago

In reply to: David Fetter (#1)

Performance gain from reduction of GROUP BY memory allocations

In PostgreSQL Weekly News, David Fetter wrote:

Please test the new beta. Some of the new features are at
http://developer.postgresql.org/docs/postgres/release.html#RELEASE-8-1

I notice that Neil's patch regarding reducing the number of memory
allocations during aggregation operations isn't mentioned. It was
originally discussed in 8.0beta (2-3?) time.

What happened there?
- patch not committed in the end
- committed but not mentioned, as a dropped item
- committed but not mentioned, since part of a larger patch

Seemed like a good performance gain to me...

I *have* tried to find this, but Neil's work on memory allocation has
been extensive... (please take the compliment...).

Best Regards, Simon Riggs

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Simon Riggs (#2)

Re: Performance gain from reduction of GROUP BY memory allocations

Simon Riggs <simon@2ndquadrant.com> writes:

I notice that Neil's patch regarding reducing the number of memory
allocations during aggregation operations isn't mentioned. It was
originally discussed in 8.0beta (2-3?) time.

What happened there?
- patch not committed in the end
- committed but not mentioned, as a dropped item
- committed but not mentioned, since part of a larger patch

Are you speaking of these patches?

2005-04-06 19:56 neilc

* src/backend/utils/adt/: float.c, numeric.c: Apply the "nodeAgg"
optimization to more of the builtin transition functions. This
patch optimizes int2_sum(), int4_sum(), float4_accum() and
float8_accum() to avoid needing to copy the transition function's
state for each input tuple of the aggregate. In an extreme case
(e.g. SELECT sum(int2_col) FROM table where table has a single
column), it improves performance by about 20%. For more complex
queries or tables with wider rows, the relative performance
improvement will not be as significant.

2005-04-04 19:50 neilc

* src/backend/utils/adt/numeric.c: This patch changes
int2_avg_accum() and int4_avg_accum() use the nodeAgg performance
hack Tom introduced recently. This means we can avoid copying the
transition array for each input tuple if these functions are
invoked as aggregate transition functions.

To test the performance improvement, I created a 1 million row
table with a single int4 column. Without the patch, SELECT avg(col)
FROM table took about 4.2 seconds (after the data was cached); with
the patch, it took about 3.2 seconds. Naturally, the performance
improvement for a less trivial query (or a table with wider rows)
would be relatively smaller.

2005-03-12 15:25 tgl

* contrib/intagg/int_aggregate.c,
contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml,
doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c,
src/backend/utils/adt/int8.c: Adjust the API for aggregate function
calls so that a C-coded function can tell whether it is being used
as an aggregate or not. This allows such a function to avoid
re-pallocing a pass-by-reference transition value; normally it
would be unsafe for a function to scribble on an input, but in the
aggregate case it's safe to reuse the old transition value. Make
int8inc() do this. This gets a useful improvement in the speed of
COUNT(*), at least on narrow tables (it seems to be swamped by I/O
when the table rows are wide). Per a discussion in early December
with Neil Conway. I also fixed int_aggregate.c to check this,
thereby turning it into something approaching a supportable
technique instead of being a crude hack.

I don't recall how Neil's original patch differed from what got
applied...

regards, tom lane

Simon Riggs

simon@2ndQuadrant.com

almost 21 years ago

In reply to: Tom Lane (#3)

Re: Performance gain from reduction of GROUP BY memory

On Mon, 2005-08-29 at 20:25 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I notice that Neil's patch regarding reducing the number of memory
allocations during aggregation operations isn't mentioned. It was
originally discussed in 8.0beta (2-3?) time.

What happened there?
- patch not committed in the end
- committed but not mentioned, as a dropped item
- committed but not mentioned, since part of a larger patch

Are you speaking of these patches?

Yes, those look like the ones I mentioned.

Those seem to have a useful performance improvement?

At very least, the change in Aggregate function API should be mentioned,
no?

2005-04-06 19:56 neilc

* src/backend/utils/adt/: float.c, numeric.c: Apply the "nodeAgg"
optimization to more of the builtin transition functions. This
patch optimizes int2_sum(), int4_sum(), float4_accum() and
float8_accum() to avoid needing to copy the transition function's
state for each input tuple of the aggregate. In an extreme case
(e.g. SELECT sum(int2_col) FROM table where table has a single
column), it improves performance by about 20%. For more complex
queries or tables with wider rows, the relative performance
improvement will not be as significant.

2005-04-04 19:50 neilc

* src/backend/utils/adt/numeric.c: This patch changes
int2_avg_accum() and int4_avg_accum() use the nodeAgg performance
hack Tom introduced recently. This means we can avoid copying the
transition array for each input tuple if these functions are
invoked as aggregate transition functions.

To test the performance improvement, I created a 1 million row
table with a single int4 column. Without the patch, SELECT avg(col)
FROM table took about 4.2 seconds (after the data was cached); with
the patch, it took about 3.2 seconds. Naturally, the performance
improvement for a less trivial query (or a table with wider rows)
would be relatively smaller.

2005-03-12 15:25 tgl

* contrib/intagg/int_aggregate.c,
contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml,
doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c,
src/backend/utils/adt/int8.c: Adjust the API for aggregate function
calls so that a C-coded function can tell whether it is being used
as an aggregate or not. This allows such a function to avoid
re-pallocing a pass-by-reference transition value; normally it
would be unsafe for a function to scribble on an input, but in the
aggregate case it's safe to reuse the old transition value. Make
int8inc() do this. This gets a useful improvement in the speed of
COUNT(*), at least on narrow tables (it seems to be swamped by I/O
when the table rows are wide). Per a discussion in early December
with Neil Conway. I also fixed int_aggregate.c to check this,
thereby turning it into something approaching a supportable
technique instead of being a crude hack.

I'll search CVS directly next time. Thanks.

Best Regards, Simon Riggs

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Simon Riggs (#4)

Re: Performance gain from reduction of GROUP BY memory

Simon Riggs wrote:

On Mon, 2005-08-29 at 20:25 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I notice that Neil's patch regarding reducing the number of memory
allocations during aggregation operations isn't mentioned. It was
originally discussed in 8.0beta (2-3?) time.

What happened there?
- patch not committed in the end
- committed but not mentioned, as a dropped item
- committed but not mentioned, since part of a larger patch

Are you speaking of these patches?

Yes, those look like the ones I mentioned.

Those seem to have a useful performance improvement?

At very least, the change in Aggregate function API should be mentioned,
no?

2005-03-12 15:25 tgl

* contrib/intagg/int_aggregate.c,
contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml,
doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c,
src/backend/utils/adt/int8.c: Adjust the API for aggregate function
calls so that a C-coded function can tell whether it is being used
as an aggregate or not. This allows such a function to avoid
re-pallocing a pass-by-reference transition value; normally it
would be unsafe for a function to scribble on an input, but in the
aggregate case it's safe to reuse the old transition value. Make
int8inc() do this. This gets a useful improvement in the speed of
COUNT(*), at least on narrow tables (it seems to be swamped by I/O
when the table rows are wide). Per a discussion in early December
with Neil Conway. I also fixed int_aggregate.c to check this,
thereby turning it into something approaching a supportable
technique instead of being a crude hack.

I don't usually document internal API changes in the release notes.
Should I?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Alvaro Herrera

alvherre@2ndquadrant.com

almost 21 years ago

In reply to: Bruce Momjian (#5)

Re: Performance gain from reduction of GROUP BY memory

On Tue, Aug 30, 2005 at 10:23:49AM -0400, Bruce Momjian wrote:

2005-03-12 15:25 tgl

* contrib/intagg/int_aggregate.c,
contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml,
doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c,
src/backend/utils/adt/int8.c: Adjust the API for aggregate function
calls so that a C-coded function can tell whether it is being used
as an aggregate or not. This allows such a function to avoid
re-pallocing a pass-by-reference transition value; normally it
would be unsafe for a function to scribble on an input, but in the
aggregate case it's safe to reuse the old transition value. Make
int8inc() do this. This gets a useful improvement in the speed of
COUNT(*), at least on narrow tables (it seems to be swamped by I/O
when the table rows are wide). Per a discussion in early December
with Neil Conway. I also fixed int_aggregate.c to check this,
thereby turning it into something approaching a supportable
technique instead of being a crude hack.

I don't usually document internal API changes in the release notes.
Should I?

Doesn't this potentially affect user-defined aggregates?

--
Alvaro Herrera <alvherre[]alvh.no-ip.org> Architect, www.EnterpriseDB.com
"Lo esencial es invisible para los ojos" (A. de Saint Exï¿½pery)

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Alvaro Herrera (#6)

Re: Performance gain from reduction of GROUP BY memory

Alvaro Herrera wrote:

On Tue, Aug 30, 2005 at 10:23:49AM -0400, Bruce Momjian wrote:

2005-03-12 15:25 tgl

* contrib/intagg/int_aggregate.c,
contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml,
doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c,
src/backend/utils/adt/int8.c: Adjust the API for aggregate function
calls so that a C-coded function can tell whether it is being used
as an aggregate or not. This allows such a function to avoid
re-pallocing a pass-by-reference transition value; normally it
would be unsafe for a function to scribble on an input, but in the
aggregate case it's safe to reuse the old transition value. Make
int8inc() do this. This gets a useful improvement in the speed of
COUNT(*), at least on narrow tables (it seems to be swamped by I/O
when the table rows are wide). Per a discussion in early December
with Neil Conway. I also fixed int_aggregate.c to check this,
thereby turning it into something approaching a supportable
technique instead of being a crude hack.

I don't usually document internal API changes in the release notes.
Should I?

Doesn't this potentially affect user-defined aggregates?

I read it as something that _could_ be used by user-defined aggregates,
but not something that would require a changes to a user-defined
aggregate.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Bruce Momjian (#7)

Re: Performance gain from reduction of GROUP BY memory

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Alvaro Herrera wrote:

On Tue, Aug 30, 2005 at 10:23:49AM -0400, Bruce Momjian wrote:

I don't usually document internal API changes in the release notes.
Should I?

Doesn't this potentially affect user-defined aggregates?

I read it as something that _could_ be used by user-defined aggregates,
but not something that would require a changes to a user-defined
aggregate.

I tend to agree with Bruce here. If he documented changes of this size
the release notes would be twice as long and even fewer people would
read them.

regards, tom lane