proposal: tuplestore, tuplesort aggregate functions

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: tuplestore, tuplesort aggregate functions

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of spare
parts. It's also exposing some implementation details that I'd just as
soon not expose in SQL. I'd rather just implement median as a
special-purpose aggregate.

regards, tom lane

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#2)

Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of spare
parts. It's also exposing some implementation details that I'd just as
soon not expose in SQL. I'd rather just implement median as a
special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for this
functionality. So I searched different more general way. On the other
hand, I agree so special purpose aggregate (with a few changes in
nodeAgg) can be enough. The median (and additional forms) is really
special and there are not wide used use case.

Regards

Pavel

Show quoted text

regards, tom lane

david@fetter.org

over 15 years ago

In reply to: Pavel Stehule (#3)

Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of
spare parts. ï¿½It's also exposing some implementation details that
I'd just as soon not expose in SQL. ï¿½I'd rather just implement
median as a special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for
this functionality. So I searched different more general way. On the
other hand, I agree so special purpose aggregate (with a few changes
in nodeAgg) can be enough. The median (and additional forms) is
really special and there are not wide used use case.

Which median do you plan to implement? Or do you plan to implement
several different medians, each with distinguishing names?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

pavel.stehule@gmail.com

over 15 years ago

In reply to: David Fetter (#4)

Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 David Fetter <david@fetter.org>:

On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of
spare parts. It's also exposing some implementation details that
I'd just as soon not expose in SQL. I'd rather just implement
median as a special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for
this functionality. So I searched different more general way. On the
other hand, I agree so special purpose aggregate (with a few changes
in nodeAgg) can be enough. The median (and additional forms) is
really special and there are not wide used use case.

Which median do you plan to implement? Or do you plan to implement
several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Regards

Pavel

Show quoted text

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

david@fetter.org

over 15 years ago

In reply to: Pavel Stehule (#5)

Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:

2010/8/18 David Fetter <david@fetter.org>:

Which median do you plan to implement? ï¿½Or do you plan to implement
several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

Otherwise, it seems like a more reasonable thing to make the medians
special case code.

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: David Fetter (#6)

Re: proposal: tuplestore, tuplesort aggregate functions

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

regards, tom lane

pavel.stehule@gmail.com

over 15 years ago

In reply to: David Fetter (#6)

Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 David Fetter <david@fetter.org>:

On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:

2010/8/18 David Fetter <david@fetter.org>:

Which median do you plan to implement? Or do you plan to implement
several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

I didn't think about some special median - this proposal is just about
aggregates with large a transistent data. Then the access to
tuplestore can be very usefull.

Otherwise, it seems like a more reasonable thing to make the medians
special case code.

yes, minimally for this moment.

Regards

Pavel

Show quoted text

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

david@fetter.org

over 15 years ago

In reply to: Tom Lane (#7)

Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Thanks for clarifying. Might this help out with things like GROUPING
SETS or wCTEs?

Cheers,
David (a little slow today).
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#10

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#7)

Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function. This mechanism is used for aggregates with ORDER BY now. So
there can be a special path for direct call of final func. There is
useles to call transident function.

Regards

Pavel

Show quoted text

regards, tom lane

#11

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: David Fetter (#9)

Re: proposal: tuplestore, tuplesort aggregate functions

David Fetter <david@fetter.org> writes:

On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Thanks for clarifying. Might this help out with things like GROUPING
SETS or wCTEs?

Don't see how --- this is just about what you can do within an aggregate.

regards, tom lane

#12

david@fetter.org

over 15 years ago

In reply to: Pavel Stehule (#10)

Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 04:46:57PM +0200, Pavel Stehule wrote:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you
have in mind to start with? ï¿½If you can provide examples of
"median-like" aggregates that people might need to implement as
user-defined aggregates, or other places where people would use
this machinery, it will make your case stronger for this
refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. ï¿½All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever. I
don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function. This mechanism is used for aggregates with ORDER BY now.
So there can be a special path for direct call of final func. There
is useles to call transident function.

Just a wacky idea here. Could we make a special state transition
function called IDENTITY or some such that would turn into a noop?

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#10)

Re: proposal: tuplestore, tuplesort aggregate functions

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function.

Well, that would require a whole bunch of *other* mechanisms, which you
weren't saying anything about in your original proposal. But driving
it off the transtype declaration would be quite inappropriate anyway IMO.

regards, tom lane

#14

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#13)

Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function.

Well, that would require a whole bunch of *other* mechanisms, which you
weren't saying anything about in your original proposal. But driving
it off the transtype declaration would be quite inappropriate anyway IMO.

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort. So I
think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

Regards

Pavel Stehule

Show quoted text

regards, tom lane

#15

pavel.stehule@gmail.com

over 15 years ago

In reply to: Pavel Stehule (#14)

Re: proposal: tuplestore, tuplesort aggregate functions

Hello

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort. So I
think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

yesterday I did a small test. Aggregates without transident functions
are only about 2% faster, so there has no sense thinking more about
them. I'll send a patch with median and percentile functions
immediately - these functions are implemented like usual aggregates.

Regards

Pavel

Show quoted text

Regards

Pavel Stehule

regards, tom lane

#16

david@fetter.org

over 15 years ago

In reply to: Pavel Stehule (#15)

Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 12:45:13PM +0200, Pavel Stehule wrote:

Hello

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort.
So I think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

yesterday I did a small test. Aggregates without transident
functions are only about 2% faster, so there has no sense thinking
more about them. I'll send a patch with median and percentile
functions immediately - these functions are implemented like usual
aggregates.

NTILE is already a windowing function. Might want to check into any
performance improvements you can give that.

As to median, please make sure you say in detail which median you're
using and name it so, as there is no single, authoritative median.

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#17

pavel.stehule@gmail.com

over 15 years ago

In reply to: David Fetter (#16)

Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/19 David Fetter <david@fetter.org>:

On Thu, Aug 19, 2010 at 12:45:13PM +0200, Pavel Stehule wrote:

Hello

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort.
So I think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

yesterday I did a small test. Aggregates without transident
functions are only about 2% faster, so there has no sense thinking
more about them. I'll send a patch with median and percentile
functions immediately - these functions are implemented like usual
aggregates.

NTILE is already a windowing function. Might want to check into any
performance improvements you can give that.

The performance has to be +/- same. It is based on same technology - tuplesort.

As to median, please make sure you say in detail which median you're
using and name it so, as there is no single, authoritative median.

I searched about this. And I found so there is a different methods for
same statistic value with different names. But result has to be same.
I don't found a other median than "arithmetic" or "financial" median
(with a two derived forms - left, right median). These methods are a
different forms of some SQL - see
http://www.simple-talk.com/sql/t-sql-programming/median-workbench/ and
it is SQL related solutions. With tuplesort I can to use a simple
solution - I have a number of elements, have a sorted set and can
move to n position is set.

Next forms of median what I found are estimations.

regards

Pavel

Show quoted text

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#18

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: David Fetter (#16)

Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 9:46 AM, David Fetter <david@fetter.org> wrote:

As to median, please make sure you say in detail which median you're
using and name it so, as there is no single, authoritative median.

You've made this assertion at least three times now, but I confess
that I've only ever learned one way to compute a median; and quick
Google searches for "median", "kinds of median", and few other
variants failed to turn up anything obvious either. It seems to me
that if median is good enough for Oracle, Sybase, Excel, and the nun
who taught my fifth-grade elementary school class, it ought to be good
enough for us, too. (Don't mess with Sr. Catherine!)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#19

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#18)

Re: proposal: tuplestore, tuplesort aggregate functions

Robert Haas <robertmhaas@gmail.com> writes:

You've made this assertion at least three times now, but I confess
that I've only ever learned one way to compute a median; and quick
Google searches for "median", "kinds of median", and few other
variants failed to turn up anything obvious either.

There are different ways to define it when the number of samples is even.
However I believe that "use the mean of the two middle values" is much
the most common way to deal with that.

regards, tom lane

#20

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#19)

Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

You've made this assertion at least three times now, but I confess
that I've only ever learned one way to compute a median; and quick
Google searches for "median", "kinds of median", and few other
variants failed to turn up anything obvious either.

There are different ways to define it when the number of samples is even.
However I believe that "use the mean of the two middle values" is much
the most common way to deal with that.

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. Like, what
do you do with {1, 2}?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#21

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 15 years ago

In reply to: Robert Haas (#20)

Re: proposal: tuplestore, tuplesort aggregate functions

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're

working

with a type like int4 where the values are discrete steps. Like,

what

do you do with {1, 2}?

The same thing you do with the avg function?

-Kevin

#22

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Kevin Grittner (#21)

Re: proposal: tuplestore, tuplesort aggregate functions

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. Like, what
do you do with {1, 2}?

Hmm, good point.

The same thing you do with the avg function?

avg's approach is not at all datatype-independent though. If you're
willing to give up the idea of a polymorphic median() function, that
would probably be the thing to do. If not, you need to take the left
or right one of the two central elements.

regards, tom lane

#23

david@fetter.org

over 15 years ago

In reply to: Tom Lane (#22)

Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. Like, what
do you do with {1, 2}?

Hmm, good point.

The same thing you do with the avg function?

avg's approach is not at all datatype-independent though. If you're
willing to give up the idea of a polymorphic median() function, that
would probably be the thing to do. If not, you need to take the left
or right one of the two central elements.

Whether the median needs to be in the sample is one question that
doesn't really have a universal right answer. A separate question,
also without a universal right answer, is whether the median needs to
be in the same domain as the sample, int4 being the case above.

We could just go with "whatever Oracle, DB2 and MS-SQL Server have,"
assuming it's the same thing, until something appears in the SQL
standard.

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#24

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: David Fetter (#23)

Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 5:53 PM, David Fetter <david@fetter.org> wrote:

On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. Like, what
do you do with {1, 2}?

Hmm, good point.

The same thing you do with the avg function?

avg's approach is not at all datatype-independent though. If you're
willing to give up the idea of a polymorphic median() function, that
would probably be the thing to do. If not, you need to take the left
or right one of the two central elements.

Whether the median needs to be in the sample is one question that
doesn't really have a universal right answer. A separate question,
also without a universal right answer, is whether the median needs to
be in the same domain as the sample, int4 being the case above.

We could just go with "whatever Oracle, DB2 and MS-SQL Server have,"
assuming it's the same thing, until something appears in the SQL
standard.

That's usually a sensible default, when in doubt. If nothing else, it
improves compatibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#25