modifying the tbale function

Started by Islam Hegazyabout 19 years ago26 messageshackers

islheg@hotmail.com

about 19 years ago

Hi there

I am trying to modify the execution of the table function to work in iterator fashion instead of materializing the output. I have been digging the Postgresql code source for about a month now and I can figure out where the execution of the table function works. I will be very grateful if anyone tell where to begin as my project due is after 10 days only.

Regards
Islam Hegazy

Martijn van Oosterhout

kleptog@svana.org

about 19 years ago

In reply to: Islam Hegazy (#1)

Re: modifying the tbale function

On Sun, Mar 18, 2007 at 01:54:55PM -0600, Islam Hegazy wrote:

I am trying to modify the execution of the table function to work in
iterator fashion instead of materializing the output. I have been
digging the Postgresql code source for about a month now and I can
figure out where the execution of the table function works. I will be
very grateful if anyone tell where to begin as my project due is
after 10 days only.

I've been thinking recently about why it's so difficult. It occurs to
me that the problem is because the language interpreters don't lend
themselves to being an iterator.

What you want is that when you call a perl tablefunction that as soon
as the perl function returns a row to return that to the caller. That
means the perl interpreter has to be able to save all its state,
return to the caller and when next called resume where it left off.
I don't know if it can do that, but it would have to be implemented for
each language (or use threads).

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Andrew Dunstan

andrew@dunslane.net

about 19 years ago

In reply to: Martijn van Oosterhout (#2)

Re: modifying the tbale function

Martijn van Oosterhout wrote:

What you want is that when you call a perl tablefunction that as soon
as the perl function returns a row to return that to the caller. That
means the perl interpreter has to be able to save all its state,
return to the caller and when next called resume where it left off.
I don't know if it can do that, but it would have to be implemented for
each language (or use threads).

We haven't even worked out how to do that cleanly for plpgsql, which we
control, let alone for any third party interpreter.

I'm not convinced it would be a huge gain anyway. Switching madly in and
out of the perl interpreter at least is a known performance problem,
IIRC - Perl XML objects have, or used to have, problems with that (and
they still don't perform terribly well).

cheers

andrew

Neil Conway

neilc@samurai.com

about 19 years ago

In reply to: Andrew Dunstan (#3)

Re: modifying the tbale function

Andrew Dunstan wrote:

I'm not convinced it would be a huge gain anyway. Switching madly in
and out of the perl interpreter at least is a known performance
problem, IIRC

Returning control to the backend for every row returned would likely be
excessive, but you could return once every k rows and get most of the
benefits of both approaches (k might be on the order of 1000). The
problem with the current approach is that it makes returning large
result sets from PL functions very expensive, since they need to be
spooled to disk.

As for using threads, that's pretty much a non-starter: we can't safely
allow calls into the backend from multiple concurrent threads, and I
doubt that will chance any time soon.

-Neil

Islam Hegazy

islheg@hotmail.com

about 19 years ago

In reply to: Islam Hegazy (#1)

Re: modifying the tbale function

Returning k rows would be a reasonable solution but which functions need to
be modified to achieve this.

----- Original Message -----
From: "Neil Conway" <neilc@samurai.com>
To: "Andrew Dunstan" <andrew@dunslane.net>
Cc: "Martijn van Oosterhout" <kleptog@svana.org>; "Islam Hegazy"
<islheg@hotmail.com>; <pgsql-hackers@postgresql.org>
Sent: Sunday, March 18, 2007 4:57 PM
Subject: Re: [HACKERS] modifying the tbale function

Show quoted text

Andrew Dunstan wrote:

I'm not convinced it would be a huge gain anyway. Switching madly in and
out of the perl interpreter at least is a known performance problem, IIRC

Returning control to the backend for every row returned would likely be
excessive, but you could return once every k rows and get most of the
benefits of both approaches (k might be on the order of 1000). The problem
with the current approach is that it makes returning large result sets
from PL functions very expensive, since they need to be spooled to disk.

As for using threads, that's pretty much a non-starter: we can't safely
allow calls into the backend from multiple concurrent threads, and I doubt
that will chance any time soon.

-Neil

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Neil Conway (#4)

Re: modifying the tbale function

Neil Conway <neilc@samurai.com> writes:

Returning control to the backend for every row returned would likely be
excessive, but you could return once every k rows and get most of the
benefits of both approaches (k might be on the order of 1000).

However, this still leaves us with no idea how to persuade perl, tcl,
python, et al to cooperate.

I think you are underestimating the cost of suspending/resuming any of
those interpreters, and overestimating the cost of a tuplestore, which
on a per-tuple basis is really pretty cheap. It's quite likely that the
proposed project would produce piddling or negative gains, after
expending a huge amount of work. (A tenth of the effort on optimizing
tuplestore some more would probably be a better investment.)

A cross-check on this theory could be made without a lot of effort: hack
SQL functions to use a tuplestore (fed via the tuplestore destreceiver,
so as not to exit the executor) instead of return-after-every-tuple.
Compare performance. I kinda suspect you'll find it a loss even there.

regards, tom lane

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 19 years ago

In reply to: Tom Lane (#6)

Re: modifying the tbale function

Tom Lane wrote:

Neil Conway <neilc@samurai.com> writes:

Returning control to the backend for every row returned would likely be
excessive, but you could return once every k rows and get most of the
benefits of both approaches (k might be on the order of 1000).

However, this still leaves us with no idea how to persuade perl, tcl,
python, et al to cooperate.

It seem like a useful optimization for C-functions, though. I was caught
by surprise a while ago when I realized that the way I've been using to
create simple test data quickly:

INSERT INTO foo SELECT key FROM generate_series(1, <large number>) key

materializes the generate_series result set first.

I'd like to have that changed, even if we leave the behavior as it is
for PLs.

Another affected use case is using dblink to copy large tables.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Andrew Dunstan

andrew@dunslane.net

about 19 years ago

In reply to: Heikki Linnakangas (#7)

Re: modifying the tbale function

Heikki Linnakangas wrote:

Tom Lane wrote:

Neil Conway <neilc@samurai.com> writes:

Returning control to the backend for every row returned would likely
be excessive, but you could return once every k rows and get most of
the benefits of both approaches (k might be on the order of 1000).

However, this still leaves us with no idea how to persuade perl, tcl,
python, et al to cooperate.

It seem like a useful optimization for C-functions, though. I was
caught by surprise a while ago when I realized that the way I've been
using to create simple test data quickly:

Actually, I think we could teach the PLs to do it - just not
transparently, so we'd need to mark which functions used the new
protocol. Such functions would get a state object as an implied first
argument, so in plperl it might work like this (for a
generate_series-like function):

my $state = shift;
my $low = shift;
my $high = shift;
if ($state->{callstatus} eq 'firstcall')
{
$state->{counter} = $low;
}
elseif ($state->{callstatus} eq 'cleanup')
{
# do cleanup here
$state->{return_status} = 'cleaned';
return;
}
my $next = $state->{counter}++;
$state->{return_status} = $next < $high ? 'result' : 'last_result';
return $next;

To support this I think we'd need to do something like:

create function mygs(int, int)
returns setof int
language plperl
with srfstate
as $$ ... $$;

cheers

andrew

Richard Huxton

dev@archonet.com

about 19 years ago

In reply to: Andrew Dunstan (#8)

Re: modifying the tbale function

Andrew Dunstan wrote:

Actually, I think we could teach the PLs to do it - just not
transparently, so we'd need to mark which functions used the new
protocol. Such functions would get a state object as an implied first
argument, so in plperl it might work like this (for a
generate_series-like function):

To support this I think we'd need to do something like:

create function mygs(int, int)
returns setof int
language plperl
with srfstate
as $$ ... $$;

Is this not what we do with aggregate functions at present?

--
Richard Huxton
Archonet Ltd

#10

Andrew Dunstan

andrew@dunslane.net

about 19 years ago

In reply to: Richard Huxton (#9)

Re: modifying the tbale function

Richard Huxton wrote:

Andrew Dunstan wrote:

Actually, I think we could teach the PLs to do it - just not
transparently, so we'd need to mark which functions used the new
protocol. Such functions would get a state object as an implied first
argument, so in plperl it might work like this (for a
generate_series-like function):

To support this I think we'd need to do something like:

create function mygs(int, int)
returns setof int
language plperl
with srfstate
as $$ ... $$;

Is this not what we do with aggregate functions at present?

Yes, more or less. That's what made me think of it.

OTOH, before we rush out and do it someone needs to show that it's a net
win. I agree with Tom that making tuplestore faster would probably be a
much better investment of time.

cheers

andrew

#11

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Andrew Dunstan (#10)

Re: modifying the tbale function

Andrew Dunstan <andrew@dunslane.net> writes:

Richard Huxton wrote:

Is this not what we do with aggregate functions at present?

Yes, more or less. That's what made me think of it.

OTOH, before we rush out and do it someone needs to show that it's a net
win.

Yeah, because this isn't doing anything to address the problem of
entry/exit overhead from calling a PL function many times. I kinda
dislike shoving the problem onto the heads of PL programmers anyway...

regards, tom lane

#12

Bruce Momjian

bruce@momjian.us

about 19 years ago

In reply to: Andrew Dunstan (#10)

Re: modifying the tbale function

"Andrew Dunstan" <andrew@dunslane.net> writes:

Yes, more or less. That's what made me think of it.

OTOH, before we rush out and do it someone needs to show that it's a net win. I
agree with Tom that making tuplestore faster would probably be a much better
investment of time.

I don't think the problem with the tuplestore is a matter of speed. It's a
matter of scalability and flexibility. It limits the types of applications
that can use SRFs and the amount of data they can manipulate before it becomes
impractical.

Consider applications like dblink that have SRFs that read data from a slow
network sources. Or that generate more data than the server can actually store
at any one time. Or that overflow work_mem but are used in queries that could
return quickly based on the first few records.

Unfortunately, I don't think there's a simple fix that'll work for all PLs
using the current interface. Even languages with iterators themselves (python,
I think) probably don't expect to be called externally while an iterator is in
progress.

It seems to me the way to fix it is to abandon the iterator style interface
and an interface that allows you to implement a SRF by providing a function
that returns just the "next" record. It would have to save enough state for
the next iteration explicitly in a data structure rather than being able to
depend on the entire program state being restored.

You could argue you could already do this using a non-SRF but there are two
problems: 1) there's no convenient way to stash the state anywhere and 2) it
wouldn't be convenient to use in SQL in FROM clauses the way SRFs are.

IIRC there is already a hack in at least one of the PLs to stash data in a
place where you can access it in repeated invocations. It doesn't work
correctly if you call your function from two different places from a query. It
would take executor support for such state data structures to fix that
problem.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#13

Florian Pflug

fgp@phlo.org

about 19 years ago

In reply to: Bruce Momjian (#12)

Re: modifying the tbale function

Gregory Stark wrote:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Yes, more or less. That's what made me think of it.

OTOH, before we rush out and do it someone needs to show that it's a net win. I
agree with Tom that making tuplestore faster would probably be a much better
investment of time.

I don't think the problem with the tuplestore is a matter of speed. It's a
matter of scalability and flexibility. It limits the types of applications
that can use SRFs and the amount of data they can manipulate before it becomes
impractical.

Consider applications like dblink that have SRFs that read data from a slow
network sources. Or that generate more data than the server can actually store
at any one time. Or that overflow work_mem but are used in queries that could
return quickly based on the first few records.

Unfortunately, I don't think there's a simple fix that'll work for all PLs
using the current interface. Even languages with iterators themselves (python,
I think) probably don't expect to be called externally while an iterator is in
progress.

Just a thought - I believe that there are portable user-space thread
implementations that contain little or no machine-specific code. What
if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.

greetings, Florian Pflug

#14

Andrew Dunstan

andrew@dunslane.net

about 19 years ago

In reply to: Florian Pflug (#13)

Re: modifying the tbale function

Florian G. Pflug wrote:

Just a thought - I believe that there are portable user-space thread
implementations that contain little or no machine-specific code. What
if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.

This just sounds horribly fragile.

Are we really sure that this isn't a solution in search of a problem?

cheers

andrew

#15

Islam Hegazy

islheg@gmail.com

about 19 years ago

In reply to: Islam Hegazy (#1)

Re: modifying the tbale function

So, I understood from all those opinions that much of the work is to be done
in the interface language interpreter not postgresql code itself. Am I right
or I missed something?

Regards
Islam Hegazy

----- Original Message -----
From: "Andrew Dunstan" <andrew@dunslane.net>
To: "Florian G. Pflug" <fgp@phlo.org>
Cc: "Gregory Stark" <stark@enterprisedb.com>; "Richard Huxton"
<dev@archonet.com>; "Heikki Linnakangas" <heikki@enterprisedb.com>; "Tom
Lane" <tgl@sss.pgh.pa.us>; "Neil Conway" <neilc@samurai.com>; "Martijn van
Oosterhout" <kleptog@svana.org>; "Islam Hegazy" <islheg@hotmail.com>;
<pgsql-hackers@postgresql.org>
Sent: Monday, March 19, 2007 12:18 PM
Subject: Re: [HACKERS] modifying the tbale function

Show quoted text

Florian G. Pflug wrote:

Just a thought - I believe that there are portable user-space thread
implementations that contain little or no machine-specific code. What
if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.

This just sounds horribly fragile.

Are we really sure that this isn't a solution in search of a problem?

cheers

andrew

#16

Joe Conway

mail@joeconway.com

about 19 years ago

In reply to: Andrew Dunstan (#14)

Re: modifying the tbale function

Andrew Dunstan wrote:

Are we really sure that this isn't a solution in search of a problem?

The need for value-per-call is real (examples mentioned down-thread) and
was anticipated from day one of the SRF implementation (in fact the
first patch I wrote was value-per-call, not materialize). But when we
realized that value-per-call was not going to work very well for any PL
*except* C-functions, we switched to SFRM_Materialize as the only
supported mode, with SFRM_ValuePerCall left as a to-be-coded-later
option (see SetFunctionReturnMode in execnodes.h).

Personally I think it is worth having SFRM_ValuePerCall even if only C
functions can make use of it.

Joe

#17

Andrew Dunstan

andrew@dunslane.net

about 19 years ago

In reply to: Joe Conway (#16)

Re: modifying the tbale function

Joe Conway wrote:

Andrew Dunstan wrote:

Are we really sure that this isn't a solution in search of a problem?

The need for value-per-call is real (examples mentioned down-thread)
and was anticipated from day one of the SRF implementation (in fact
the first patch I wrote was value-per-call, not materialize). But when
we realized that value-per-call was not going to work very well for
any PL *except* C-functions, we switched to SFRM_Materialize as the
only supported mode, with SFRM_ValuePerCall left as a
to-be-coded-later option (see SetFunctionReturnMode in execnodes.h).

Personally I think it is worth having SFRM_ValuePerCall even if only C
functions can make use of it.

Yeah, makes plenty of sense for C funcs. I don't think there's an
argument about that. But for that we don't need any threading
infrastructure.

cheers

andrew

#18

Joe Conway

mail@joeconway.com

about 19 years ago

In reply to: Andrew Dunstan (#17)

Re: modifying the tbale function

Andrew Dunstan wrote:

Joe Conway wrote:

Andrew Dunstan wrote:

Are we really sure that this isn't a solution in search of a problem?

The need for value-per-call is real (examples mentioned down-thread)
and was anticipated from day one of the SRF implementation (in fact
the first patch I wrote was value-per-call, not materialize). But when
we realized that value-per-call was not going to work very well for
any PL *except* C-functions, we switched to SFRM_Materialize as the
only supported mode, with SFRM_ValuePerCall left as a
to-be-coded-later option (see SetFunctionReturnMode in execnodes.h).

Personally I think it is worth having SFRM_ValuePerCall even if only C
functions can make use of it.

Yeah, makes plenty of sense for C funcs. I don't think there's an
argument about that. But for that we don't need any threading
infrastructure.

Oh sure -- sorry I wasn't clear. I wasn't trying to support the idea of
threading so much as the idea that value-per-call itself has merit for a
number of use cases.

Joe

#19

Bruce Momjian

bruce@momjian.us

about 19 years ago

In reply to: Florian Pflug (#13)

Re: modifying the tbale function

"Florian G. Pflug" <fgp@phlo.org> writes:

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache...

Yes it would. Consider what happens if the PL function calls into SPI to
execute a query....

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#20

Florian Pflug

fgp@phlo.org

about 19 years ago

In reply to: Andrew Dunstan (#14)

Re: modifying the tbale function

Andrew Dunstan wrote:

Florian G. Pflug wrote:

Just a thought - I believe that there are portable user-space thread
implementations that contain little or no machine-specific code. What
if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.

This just sounds horribly fragile.

Why would it be? It's about the same as running postgresql in one thread,
and some PL in another. This should only cause trouble if both use some
non-reentrant libc-functions. But even that wouldn't matter because of
the well-defined context switching points.

Here is a paper about portable userspace threads that I just googled.
http://www.gnu.org/software/pth/rse-pmt.ps

Are we really sure that this isn't a solution in search of a problem?

I think this really depends on how you define "problem". Some people
might think that "select * from myfunc(...) limit 1" should stop and
return a result after myfunc(...) has returned one row. Others will
say "well, just use a different software design that doesn't depend
on this optimization".

greetings, Florian Pflug

#21

Florian Pflug

fgp@phlo.org

about 19 years ago

In reply to: Bruce Momjian (#19)

#22

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Florian Pflug (#20)

#23