question about window function in C

Started by Dan Sover 11 years ago6 messagesgeneral
Jump to latest
#1Dan S
strd911@gmail.com

Hi !

I'm trying to write a window function in C .

In the function I'm using a tuplesort to sort the window data and then do
some processing.
Now when I'm running the function I get this: 'WARNING: temporary file
leak: File 43 still referenced'
The warning comes from my failure to call tuplesort_end at the appropriate
time.
From the beginning I thought that calling tuplesort_end when current_pos in
the window approched WinGetPartitionRowCount would do it.
That seemed to work at first but when I put a limit-clause in the query
that doesn't work.
So I think I need to use some callback mechanism to get that to work.
I've found some reference in the postgres source code to
RegisterExprContextCallback which seemed promising but I have no idea how
to get the right expression context to put in as the first argument to that
function.

So my question is, how do I shut down the tuplesort properly after the last
call to my window function ?

I'm running "PostgreSQL 9.3.5 on i686-pc-linux-gnu, compiled by
gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 32-bit"

Best Regards
Dan S

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dan S (#1)
Re: question about window function in C

Dan S <strd911@gmail.com> writes:

I'm trying to write a window function in C .
In the function I'm using a tuplesort to sort the window data and then do
some processing.

Hmm ... why do you feel you need to do that? The window function's input
should already be sorted according to the window specification.

Now when I'm running the function I get this: 'WARNING: temporary file
leak: File 43 still referenced'
The warning comes from my failure to call tuplesort_end at the appropriate
time.

Unsurprising.

So I think I need to use some callback mechanism to get that to work.
I've found some reference in the postgres source code to
RegisterExprContextCallback which seemed promising but I have no idea how
to get the right expression context to put in as the first argument to that
function.

I don't think there is one :-(. WindowAgg has a per-input-tuple econtext,
and a per-output-tuple econtext, but what you'd need for this is a
partition-lifespan econtext, which doesn't exist.

It's possible that we could promote the "partcontext" memory context into
a full econtext so as to support this sort of behavior. But I'd want to
see a reasonably convincing use-case for it.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Dan S
strd911@gmail.com
In reply to: Tom Lane (#2)
Re: question about window function in C

Well I'm trying to implement a window-function that works on range_types
and produces 'atomic ranges' for each input range.
Let's say I have a set of ranges some overlapping some not, and I want to
split each range at every boundary of every overlapping range and return
those.
So for each range r I want to return an array of ranges that consists of
range r split at every overlapping range boundary.
I need to consider both upper and lower boundaries and to be able to do
this in one pass over the data I need to sort both the upper and lower
boundaries and also sort the original ranges so I can loop over the ranges
and boundaries in lockstep to produce the arrays.
As a last step I sort back the arrays of atomic ranges in the original
order in the window so I can read out the tuplesort in that order and
return each array to its corresponding range r.

(The result can be used to answer questions like what are the maximum
number of simultaneously overlapping ranges and at which ranges the maximum
occurs)

Best Regards
Dan S

2014-12-26 18:57 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

Dan S <strd911@gmail.com> writes:

I'm trying to write a window function in C .
In the function I'm using a tuplesort to sort the window data and then do
some processing.

Hmm ... why do you feel you need to do that? The window function's input
should already be sorted according to the window specification.

Now when I'm running the function I get this: 'WARNING: temporary file
leak: File 43 still referenced'
The warning comes from my failure to call tuplesort_end at the

appropriate

time.

Unsurprising.

So I think I need to use some callback mechanism to get that to work.
I've found some reference in the postgres source code to
RegisterExprContextCallback which seemed promising but I have no idea how
to get the right expression context to put in as the first argument to

that

function.

I don't think there is one :-(. WindowAgg has a per-input-tuple econtext,
and a per-output-tuple econtext, but what you'd need for this is a
partition-lifespan econtext, which doesn't exist.

It's possible that we could promote the "partcontext" memory context into
a full econtext so as to support this sort of behavior. But I'd want to
see a reasonably convincing use-case for it.

regards, tom lane

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Dan S (#3)
Re: question about window function in C

On Fri, Dec 26, 2014 at 1:19 PM, Dan S <strd911@gmail.com> wrote:

Well I'm trying to implement a window-function that works on range_types and
produces 'atomic ranges' for each input range.
Let's say I have a set of ranges some overlapping some not, and I want to
split each range at every boundary of every overlapping range and return
those.
So for each range r I want to return an array of ranges that consists of
range r split at every overlapping range boundary.
I need to consider both upper and lower boundaries and to be able to do this
in one pass over the data I need to sort both the upper and lower boundaries
and also sort the original ranges so I can loop over the ranges and
boundaries in lockstep to produce the arrays.
As a last step I sort back the arrays of atomic ranges in the original order
in the window so I can read out the tuplesort in that order and return each
array to its corresponding range r.

(The result can be used to answer questions like what are the maximum number
of simultaneously overlapping ranges and at which ranges the maximum occurs)

Best Regards
Dan S

2014-12-26 18:57 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Dan S <strd911@gmail.com> writes:

I'm trying to write a window function in C .
In the function I'm using a tuplesort to sort the window data and then
do
some processing.

Hmm ... why do you feel you need to do that? The window function's input
should already be sorted according to the window specification.

Now when I'm running the function I get this: 'WARNING: temporary file
leak: File 43 still referenced'
The warning comes from my failure to call tuplesort_end at the
appropriate
time.

Unsurprising.

So I think I need to use some callback mechanism to get that to work.
I've found some reference in the postgres source code to
RegisterExprContextCallback which seemed promising but I have no idea
how
to get the right expression context to put in as the first argument to
that
function.

I don't think there is one :-(. WindowAgg has a per-input-tuple econtext,
and a per-output-tuple econtext, but what you'd need for this is a
partition-lifespan econtext, which doesn't exist.

It's possible that we could promote the "partcontext" memory context into
a full econtext so as to support this sort of behavior. But I'd want to
see a reasonably convincing use-case for it.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#4)
Re: question about window function in C

On Fri, Dec 26, 2014 at 11:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Dec 26, 2014 at 1:19 PM, Dan S <strd911@gmail.com> wrote:

Well I'm trying to implement a window-function that works on range_types and
produces 'atomic ranges' for each input range.
Let's say I have a set of ranges some overlapping some not, and I want to
split each range at every boundary of every overlapping range and return
those.
So for each range r I want to return an array of ranges that consists of
range r split at every overlapping range boundary.
I need to consider both upper and lower boundaries and to be able to do this
in one pass over the data I need to sort both the upper and lower boundaries
and also sort the original ranges so I can loop over the ranges and
boundaries in lockstep to produce the arrays.
As a last step I sort back the arrays of atomic ranges in the original order
in the window so I can read out the tuplesort in that order and return each
array to its corresponding range r.

(The result can be used to answer questions like what are the maximum number
of simultaneously overlapping ranges and at which ranges the maximum occurs)

shoot -- sorry for empty mail -- misclick.

anyways, can you give some precise examples of what you want to occur?
for example, a set of sql taking actual inputs and the output you
would like to see...

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Dan S
strd911@gmail.com
In reply to: Merlin Moncure (#5)
Re: question about window function in C

2014-12-27 6:43 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:

On Fri, Dec 26, 2014 at 11:41 PM, Merlin Moncure <mmoncure@gmail.com>
wrote:

On Fri, Dec 26, 2014 at 1:19 PM, Dan S <strd911@gmail.com> wrote:

Well I'm trying to implement a window-function that works on

range_types and

produces 'atomic ranges' for each input range.
Let's say I have a set of ranges some overlapping some not, and I want

to

split each range at every boundary of every overlapping range and return
those.
So for each range r I want to return an array of ranges that consists of
range r split at every overlapping range boundary.
I need to consider both upper and lower boundaries and to be able to do

this

in one pass over the data I need to sort both the upper and lower

boundaries

and also sort the original ranges so I can loop over the ranges and
boundaries in lockstep to produce the arrays.
As a last step I sort back the arrays of atomic ranges in the original

order

in the window so I can read out the tuplesort in that order and return

each

array to its corresponding range r.

(The result can be used to answer questions like what are the maximum

number

of simultaneously overlapping ranges and at which ranges the maximum

occurs)

shoot -- sorry for empty mail -- misclick.

anyways, can you give some precise examples of what you want to occur?
for example, a set of sql taking actual inputs and the output you
would like to see...

merlin

Say I have theses ranges (in a table or as output of a sql statement):

[2000-01-01,2000-05-01)
[2000-02-01,2003-07-01)
[2002-06-01,2003-03-01)
[2003-04-01,2004-08-01)

then for the first range [2000-01-01,2000-05-01) I want the array output :
{[2000-01-01,2000-02-01),[2000-02-01,2000-05-01)}

for the next range [2000-02-01,2003-07-01) I want the array output:
{[2000-02-01,2000-05-01),[2000-05-01,2002-06-01),[2002-06-01,2003-03-01),[2003-03-01,2003-04-01),[2003-04-01,2003-07-01)}

and for the third range [2002-06-01,2003-03-01) I want the output:
{[2002-06-01,2003-03-01)}

and for the last range [2003-04-01,2004-08-01) the output should be:
{[2003-04-01,2003-07-01),[2003-07-01,2004-08-01)}

As you can see each input range is split up at each boundary where it is
overlapped by the other ranges in the set.
I have written a window function which does this.
On the first call into the window function it calculates all the split up
ranges and stores the resulting arrays into a tuplesort and sort these back
into the original order in the window function.
On subsequent calls into the window function it pulls one array of ranges
at a time from the tuplesort and returns that as output.
Then on the last call into the window function I need to release the
tuplesort so to not leak resources.

The problem is that I can't determine which is the last call into the
function so I can't release the tuplesort at the appropriate time.
I need a way to determine when to release the tuplesort, for example a way
to register a callback at appropriate time.

Sql wise the table can look like this and say I have the 4 ranges above in
the table.

create table t1 (id int, dr daterange);

and the query with my window function: select id, dr,
window_range_split(dr) over () from t1
this query will exhaust the whole window and I can determine when to
release the tuplesort, I release it when I pull out the last tuple in the
window.

However this query won't work: select id, dr, window_range_split(dr) over
() from t1 limit 2
the number of tuples in the window is 4 and I have no idea that there are
only going to be 2 calls into the window function and so I fail release the
tuplesort on the second call.

I should also mention that it's not specifically dateranges I'm targeting.
the window function takes anyrange as argument type and returns anyarray.

Best Regards
Dan S