proposal: tuplestore, tuplesort aggregate functions

Started by Pavel Stehuleover 15 years ago25 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

when stype is ARRAY of then final and transistent functions can be a
PL functions. When stype isn't scalar, then sfunc can be undefined (it
use a buildin functions). Then we can implement a aggregate only with
final functions.

so median function can be defined:

CREATE FUNCTION num_median_final(internal) RETURNS numeric AS ...
CREATE AGGREGATE median(numeric) (stype = TUPLESORT OF numeric,
finalfunc = num_median_final);

This feature has impact primary on agg executor, and can be relative
simple - no planner changes (or not big), minimal parser changes.

Main reason for this feature is possible access to tuplesort and
tuplesort. I hope, so this can solve a problems with computing a
median and similar functions on very large datasets.

comments?

regards

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#1)
Re: proposal: tuplestore, tuplesort aggregate functions

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of spare
parts. It's also exposing some implementation details that I'd just as
soon not expose in SQL. I'd rather just implement median as a
special-purpose aggregate.

regards, tom lane

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#2)
Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of spare
parts.  It's also exposing some implementation details that I'd just as
soon not expose in SQL.  I'd rather just implement median as a
special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for this
functionality. So I searched different more general way. On the other
hand, I agree so special purpose aggregate (with a few changes in
nodeAgg) can be enough. The median (and additional forms) is really
special and there are not wide used use case.

Regards

Pavel

Show quoted text

                       regards, tom lane

#4David Fetter
david@fetter.org
In reply to: Pavel Stehule (#3)
Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of
spare parts. �It's also exposing some implementation details that
I'd just as soon not expose in SQL. �I'd rather just implement
median as a special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for
this functionality. So I searched different more general way. On the
other hand, I agree so special purpose aggregate (with a few changes
in nodeAgg) can be enough. The median (and additional forms) is
really special and there are not wide used use case.

Which median do you plan to implement? Or do you plan to implement
several different medians, each with distinguishing names?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#4)
Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 David Fetter <david@fetter.org>:

On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of
spare parts.  It's also exposing some implementation details that
I'd just as soon not expose in SQL.  I'd rather just implement
median as a special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for
this functionality. So I searched different more general way. On the
other hand, I agree so special purpose aggregate (with a few changes
in nodeAgg) can be enough. The median (and additional forms) is
really special and there are not wide used use case.

Which median do you plan to implement?  Or do you plan to implement
several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Regards

Pavel

Show quoted text

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#6David Fetter
david@fetter.org
In reply to: Pavel Stehule (#5)
Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:

2010/8/18 David Fetter <david@fetter.org>:

Which median do you plan to implement? �Or do you plan to implement
several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

Otherwise, it seems like a more reasonable thing to make the medians
special case code.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#6)
Re: proposal: tuplestore, tuplesort aggregate functions

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

regards, tom lane

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#6)
Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 David Fetter <david@fetter.org>:

On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:

2010/8/18 David Fetter <david@fetter.org>:

Which median do you plan to implement?  Or do you plan to implement
several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Apart from the medians, which "median-like" aggregates do you have in
mind to start with?  If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

I didn't think about some special median - this proposal is just about
aggregates with large a transistent data. Then the access to
tuplestore can be very usefull.

Otherwise, it seems like a more reasonable thing to make the medians
special case code.

yes, minimally for this moment.

Regards

Pavel

Show quoted text

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#9David Fetter
david@fetter.org
In reply to: Tom Lane (#7)
Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you have in
mind to start with? If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Thanks for clarifying. Might this help out with things like GROUPING
SETS or wCTEs?

Cheers,
David (a little slow today).
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#7)
Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you have in
mind to start with?  If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions.  All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function. This mechanism is used for aggregates with ORDER BY now. So
there can be a special path for direct call of final func. There is
useles to call transident function.

Regards

Pavel

Show quoted text

                       regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#9)
Re: proposal: tuplestore, tuplesort aggregate functions

David Fetter <david@fetter.org> writes:

On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Thanks for clarifying. Might this help out with things like GROUPING
SETS or wCTEs?

Don't see how --- this is just about what you can do within an aggregate.

regards, tom lane

#12David Fetter
david@fetter.org
In reply to: Pavel Stehule (#10)
Re: proposal: tuplestore, tuplesort aggregate functions

On Wed, Aug 18, 2010 at 04:46:57PM +0200, Pavel Stehule wrote:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

David Fetter <david@fetter.org> writes:

Apart from the medians, which "median-like" aggregates do you
have in mind to start with? �If you can provide examples of
"median-like" aggregates that people might need to implement as
user-defined aggregates, or other places where people would use
this machinery, it will make your case stronger for this
refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions. �All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever. I
don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function. This mechanism is used for aggregates with ORDER BY now.
So there can be a special path for direct call of final func. There
is useles to call transident function.

Just a wacky idea here. Could we make a special state transition
function called IDENTITY or some such that would turn into a noop?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#10)
Re: proposal: tuplestore, tuplesort aggregate functions

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

There would be plenty of scope to re-use the machinery without any
SQL-level extensions.  All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function.

Well, that would require a whole bunch of *other* mechanisms, which you
weren't saying anything about in your original proposal. But driving
it off the transtype declaration would be quite inappropriate anyway IMO.

regards, tom lane

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#13)
Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:

There would be plenty of scope to re-use the machinery without any
SQL-level extensions.  All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function.

Well, that would require a whole bunch of *other* mechanisms, which you
weren't saying anything about in your original proposal.  But driving
it off the transtype declaration would be quite inappropriate anyway IMO.

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort. So I
think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

Regards

Pavel Stehule

Show quoted text

                       regards, tom lane

#15Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#14)
Re: proposal: tuplestore, tuplesort aggregate functions

Hello

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort. So I
think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

yesterday I did a small test. Aggregates without transident functions
are only about 2% faster, so there has no sense thinking more about
them. I'll send a patch with median and percentile functions
immediately - these functions are implemented like usual aggregates.

Regards

Pavel

Show quoted text

Regards

Pavel Stehule

                       regards, tom lane

#16David Fetter
david@fetter.org
In reply to: Pavel Stehule (#15)
Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 12:45:13PM +0200, Pavel Stehule wrote:

Hello

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort.
So I think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

yesterday I did a small test. Aggregates without transident
functions are only about 2% faster, so there has no sense thinking
more about them. I'll send a patch with median and percentile
functions immediately - these functions are implemented like usual
aggregates.

NTILE is already a windowing function. Might want to check into any
performance improvements you can give that.

As to median, please make sure you say in detail which median you're
using and name it so, as there is no single, authoritative median.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#17Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#16)
Re: proposal: tuplestore, tuplesort aggregate functions

2010/8/19 David Fetter <david@fetter.org>:

On Thu, Aug 19, 2010 at 12:45:13PM +0200, Pavel Stehule wrote:

Hello

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort.
So I think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

yesterday I did a small test. Aggregates without transident
functions are only about 2% faster, so there has no sense thinking
more about them.  I'll send a patch with median and percentile
functions immediately - these functions are implemented like usual
aggregates.

NTILE is already a windowing function.  Might want to check into any
performance improvements you can give that.

The performance has to be +/- same. It is based on same technology - tuplesort.

As to median, please make sure you say in detail which median you're
using and name it so, as there is no single, authoritative median.

I searched about this. And I found so there is a different methods for
same statistic value with different names. But result has to be same.
I don't found a other median than "arithmetic" or "financial" median
(with a two derived forms - left, right median). These methods are a
different forms of some SQL - see
http://www.simple-talk.com/sql/t-sql-programming/median-workbench/ and
it is SQL related solutions. With tuplesort I can to use a simple
solution - I have a number of elements, have a sorted set and can
move to n position is set.

Next forms of median what I found are estimations.

regards

Pavel

Show quoted text

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#18Robert Haas
robertmhaas@gmail.com
In reply to: David Fetter (#16)
Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 9:46 AM, David Fetter <david@fetter.org> wrote:

As to median, please make sure you say in detail which median you're
using and name it so, as there is no single, authoritative median.

You've made this assertion at least three times now, but I confess
that I've only ever learned one way to compute a median; and quick
Google searches for "median", "kinds of median", and few other
variants failed to turn up anything obvious either. It seems to me
that if median is good enough for Oracle, Sybase, Excel, and the nun
who taught my fifth-grade elementary school class, it ought to be good
enough for us, too. (Don't mess with Sr. Catherine!)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#18)
Re: proposal: tuplestore, tuplesort aggregate functions

Robert Haas <robertmhaas@gmail.com> writes:

You've made this assertion at least three times now, but I confess
that I've only ever learned one way to compute a median; and quick
Google searches for "median", "kinds of median", and few other
variants failed to turn up anything obvious either.

There are different ways to define it when the number of samples is even.
However I believe that "use the mean of the two middle values" is much
the most common way to deal with that.

regards, tom lane

#20Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

You've made this assertion at least three times now, but I confess
that I've only ever learned one way to compute a median; and quick
Google searches for "median", "kinds of median", and few other
variants failed to turn up anything obvious either.

There are different ways to define it when the number of samples is even.
However I believe that "use the mean of the two middle values" is much
the most common way to deal with that.

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. Like, what
do you do with {1, 2}?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#20)
Re: proposal: tuplestore, tuplesort aggregate functions

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're

working

with a type like int4 where the values are discrete steps. Like,

what

do you do with {1, 2}?

The same thing you do with the avg function?

-Kevin

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#21)
Re: proposal: tuplestore, tuplesort aggregate functions

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. Like, what
do you do with {1, 2}?

Hmm, good point.

The same thing you do with the avg function?

avg's approach is not at all datatype-independent though. If you're
willing to give up the idea of a polymorphic median() function, that
would probably be the thing to do. If not, you need to take the left
or right one of the two central elements.

regards, tom lane

#23David Fetter
david@fetter.org
In reply to: Tom Lane (#22)
Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. Like, what
do you do with {1, 2}?

Hmm, good point.

The same thing you do with the avg function?

avg's approach is not at all datatype-independent though. If you're
willing to give up the idea of a polymorphic median() function, that
would probably be the thing to do. If not, you need to take the left
or right one of the two central elements.

Whether the median needs to be in the sample is one question that
doesn't really have a universal right answer. A separate question,
also without a universal right answer, is whether the median needs to
be in the same domain as the sample, int4 being the case above.

We could just go with "whatever Oracle, DB2 and MS-SQL Server have,"
assuming it's the same thing, until something appears in the SQL
standard.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#24Robert Haas
robertmhaas@gmail.com
In reply to: David Fetter (#23)
Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 5:53 PM, David Fetter <david@fetter.org> wrote:

On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps.  Like, what
do you do with {1, 2}?

Hmm, good point.

The same thing you do with the avg function?

avg's approach is not at all datatype-independent though.  If you're
willing to give up the idea of a polymorphic median() function, that
would probably be the thing to do.  If not, you need to take the left
or right one of the two central elements.

Whether the median needs to be in the sample is one question that
doesn't really have a universal right answer.  A separate question,
also without a universal right answer, is whether the median needs to
be in the same domain as the sample, int4 being the case above.

We could just go with "whatever Oracle, DB2 and MS-SQL Server have,"
assuming it's the same thing, until something appears in the SQL
standard.

That's usually a sensible default, when in doubt. If nothing else, it
improves compatibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#25David Fetter
david@fetter.org
In reply to: Robert Haas (#24)
Re: proposal: tuplestore, tuplesort aggregate functions

On Thu, Aug 19, 2010 at 06:03:57PM -0400, Robert Haas wrote:

On Thu, Aug 19, 2010 at 5:53 PM, David Fetter <david@fetter.org> wrote:

On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Robert Haas <robertmhaas@gmail.com> wrote:

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps. �Like, what
do you do with {1, 2}?

Hmm, good point.

The same thing you do with the avg function?

avg's approach is not at all datatype-independent though. �If
you're willing to give up the idea of a polymorphic median()
function, that would probably be the thing to do. �If not, you
need to take the left or right one of the two central elements.

Whether the median needs to be in the sample is one question that
doesn't really have a universal right answer. �A separate
question, also without a universal right answer, is whether the
median needs to be in the same domain as the sample, int4 being
the case above.

We could just go with "whatever Oracle, DB2 and MS-SQL Server
have," assuming it's the same thing, until something appears in
the SQL standard.

That's usually a sensible default, when in doubt. If nothing else,
it improves compatibility.

That's assuming we find such a thing, which I haven't yet.

Trying to stay cheery,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate