First feature patch for plperl - draft [PATCH]

Started by Tim Bunceover 16 years ago29 messageshackers
Jump to latest
#1Tim Bunce
Tim.Bunce@pobox.com

Building on my earlier plperl refactoring patch, here's a draft of my
first plperl feature patch.

Significant changes in this patch:

- New GUC plperl.on_perl_init='...perl...' for admin use.
- New GUC plperl.on_trusted_init='...perl...' for plperl user use.
- New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.
- END blocks now run at backend exit (fixes bug #5066).
- Stored procedure subs are now given names ($name__$oid).
- More error checking and reporting.
- Warnings no longer have an extra newline in the NOTICE text.
- Various minor optimizations like pre-growing data structures.

I'm working on adding tests and documentation now, meanwhile I'd very
much appreciate any feedback on the patch.

Tim.

p.s. Once this patch is complete I plan to work on patches that:
- add quote_literal and quote_identifier functions in C.
- generalize the Safe setup code to enable more control.
- formalize namespace usage, moving things out of main::
- add a way to perform inter-sub calling (at least for simple cases).
- possibly rewrite _plperl_to_pg_array in C.

Attachments:

master-plperl-feature1.patchtext/x-patch; charset=us-asciiDownload+309-202
#2David E. Wheeler
david@kineticode.com
In reply to: Tim Bunce (#1)
Re: First feature patch for plperl - draft [PATCH]

On Dec 3, 2009, at 3:30 PM, Tim Bunce wrote:

- New GUC plperl.on_perl_init='...perl...' for admin use.
- New GUC plperl.on_trusted_init='...perl...' for plperl user use.
- New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.

Since there is no documentation yet, how do these work, exactly? Or should I just wait for the docs?

- END blocks now run at backend exit (fixes bug #5066).
- Stored procedure subs are now given names ($name__$oid).
- More error checking and reporting.
- Warnings no longer have an extra newline in the NOTICE text.
- Various minor optimizations like pre-growing data structures.

Nice.

I'm working on adding tests and documentation now, meanwhile I'd very
much appreciate any feedback on the patch.

Tim.

p.s. Once this patch is complete I plan to work on patches that:
- add quote_literal and quote_identifier functions in C.

I expect you can just use the C versions in PostgreSQL. They're in utils/builtins.h, along with quote_nullable(), which might also be useful to add.

- generalize the Safe setup code to enable more control.
- formalize namespace usage, moving things out of main::

Nice.

- add a way to perform inter-sub calling (at least for simple cases).
- possibly rewrite _plperl_to_pg_array in C.

Sounds great, Tim. I'm not really qualified to say anything about the C code, but I'd be happy to try it out once there are docs.

Best,

David

#3Tim Bunce
Tim.Bunce@pobox.com
In reply to: David E. Wheeler (#2)
Re: First feature patch for plperl - draft [PATCH]

On Thu, Dec 03, 2009 at 04:53:47PM -0800, David E. Wheeler wrote:

On Dec 3, 2009, at 3:30 PM, Tim Bunce wrote:

- New GUC plperl.on_perl_init='...perl...' for admin use.
- New GUC plperl.on_trusted_init='...perl...' for plperl user use.
- New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.

Since there is no documentation yet, how do these work, exactly? Or should I just wait for the docs?

The perl code in plperl.on_perl_init gets eval'd as soon as an
interpreter is created. That could be at server startup if
shared_preload_libraries is used. plperl.on_perl_init can only be set by
an admin (PGC_SUSET).

The perl code in plperl.on_trusted_init gets eval'd when an interpreter
is initialized into trusted mode, e.g., used for the plperl language.
The perl code is eval'd inside the Safe compartment.
plperl.on_trusted_init can be set by users but it's only useful if set
before the plperl interpreter is first used.

plperl.on_untrusted_init acts like plperl.on_trusted_init but for
plperlu code.

So, if all three were set then, before any perl stored procedure or DO
block is executed, the interpreter would have executed either
on_perl_init and then on_trusted_init (for plperl), or on_perl_init and
then on_untrusted_init (for plperlu).

- END blocks now run at backend exit (fixes bug #5066).
- Stored procedure subs are now given names ($name__$oid).
- More error checking and reporting.
- Warnings no longer have an extra newline in the NOTICE text.
- Various minor optimizations like pre-growing data structures.

Nice.

Thanks.

I'm working on adding tests and documentation now, meanwhile I'd very
much appreciate any feedback on the patch.

Tim.

p.s. Once this patch is complete I plan to work on patches that:
- add quote_literal and quote_identifier functions in C.

I expect you can just use the C versions in PostgreSQL. They're in utils/builtins.h,

That's my plan. (I've been discussing this and other issues with Andrew
Dunstan via IM.)

along with quote_nullable(), which might also be useful to add.

I was planning to build that behaviour into quote_literal since it fits
naturally into perl's idea of undef and mirrors DBI's quote() method.
So:
quote_literal(undef) => "NULL"
quote_literal('foo') => "'foo'"

- generalize the Safe setup code to enable more control.

Specifically control what gets loaded into the Compartment, what gets
shared with it (e.g. sharing *a & *b as a workaround for the sort bug),
and what class to use for Safe (to enable deeper changes if desired via
subclassing). Naturally all this is only possible for admin (via
plperl.on_perl_init).

- formalize namespace usage, moving things out of main::

Nice.

- add a way to perform inter-sub calling (at least for simple cases).

My current plan here is to use an SP::AUTOLOAD to handle loading and
dispatching. So calling SP::some_random_procedure(...) will trigger
SP::AUTOLOAD to try to resolve "some_random_procedure" to a particular
stored procedure. There are three tricky parts: handling polymorphism (at
least "well enough"), making autoloading of stored procedures work
inside Safe, making it fast. I think I have reasonable approaches for
those but I won't know for sure till I work on it.

- possibly rewrite _plperl_to_pg_array in C.

Sounds great, Tim. I'm not really qualified to say anything about the
C code, but I'd be happy to try it out once there are docs.

Great. Thanks David.

Tim.

#4Jeff
threshar@torgo.978.org
In reply to: Tim Bunce (#3)
Re: First feature patch for plperl - draft [PATCH]

On Dec 4, 2009, at 6:18 AM, Tim Bunce wrote:

- generalize the Safe setup code to enable more control.

Is there any possible way to enable "use strict;" for plperl (trusted)
modules?
I would love to have that feature. Sure does help cut down on bugs and
makes things nicer.

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff (#4)
Re: First feature patch for plperl - draft [PATCH]

Jeff <threshar@threshar.is-a-geek.com> writes:

Is there any possible way to enable "use strict;" for plperl (trusted)
modules?

The plperl manual shows a way to do it using some weird syntax or
other. It'd sure be nice to be able to use the regular syntax though.

regards, tom lane

#6David E. Wheeler
david@kineticode.com
In reply to: Tim Bunce (#3)
Re: First feature patch for plperl - draft [PATCH]

On Dec 4, 2009, at 3:18 AM, Tim Bunce wrote:

The perl code in plperl.on_perl_init gets eval'd as soon as an
interpreter is created. That could be at server startup if
shared_preload_libraries is used. plperl.on_perl_init can only be set by
an admin (PGC_SUSET).

Are multiline GUCs allowed in the postgresql.conf file?

The perl code in plperl.on_trusted_init gets eval'd when an interpreter
is initialized into trusted mode, e.g., used for the plperl language.
The perl code is eval'd inside the Safe compartment.
plperl.on_trusted_init can be set by users but it's only useful if set
before the plperl interpreter is first used.

So immediately after connecting would be the place to make sure you do it, IOW.

plperl.on_untrusted_init acts like plperl.on_trusted_init but for
plperlu code.

So, if all three were set then, before any perl stored procedure or DO
block is executed, the interpreter would have executed either
on_perl_init and then on_trusted_init (for plperl), or on_perl_init and
then on_untrusted_init (for plperlu).

Awesome, thanks! This is really a great feature.

along with quote_nullable(), which might also be useful to add.

I was planning to build that behaviour into quote_literal since it fits
naturally into perl's idea of undef and mirrors DBI's quote() method.
So:
quote_literal(undef) => "NULL"
quote_literal('foo') => "'foo'"

Is there an existing `quote_literal()` in PL/Perl? If so, you might not want to change its behavior.

- generalize the Safe setup code to enable more control.

Specifically control what gets loaded into the Compartment, what gets
shared with it (e.g. sharing *a & *b as a workaround for the sort bug),
and what class to use for Safe (to enable deeper changes if desired via
subclassing). Naturally all this is only possible for admin (via
plperl.on_perl_init).

Sounds good.

- formalize namespace usage, moving things out of main::

Nice.

- add a way to perform inter-sub calling (at least for simple cases).

My current plan here is to use an SP::AUTOLOAD to handle loading and
dispatching. So calling SP::some_random_procedure(...) will trigger
SP::AUTOLOAD to try to resolve "some_random_procedure" to a particular
stored procedure. There are three tricky parts: handling polymorphism (at
least "well enough"), making autoloading of stored procedures work
inside Safe, making it fast. I think I have reasonable approaches for
those but I won't know for sure till I work on it.

I'm wondering if there might be some way to use some sort of attributes to identify data types passed to a PL/Perl function called from another PL/Perl function. Maybe some other functions that identify types, in the case of ambiguities?

foo(int(1), text('bar'));

? Kind of ugly, but perhaps only to be used if there are ambiguities? Not sure it's a great idea, mind. Just thinking out loud (so to speak).

Best,

David

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#6)
Re: First feature patch for plperl - draft [PATCH]

"David E. Wheeler" <david@kineticode.com> writes:

On Dec 4, 2009, at 3:18 AM, Tim Bunce wrote:

The perl code in plperl.on_perl_init gets eval'd as soon as an
interpreter is created. That could be at server startup if
shared_preload_libraries is used. plperl.on_perl_init can only be set by
an admin (PGC_SUSET).

Are multiline GUCs allowed in the postgresql.conf file?

I don't think so. In any case this seems like an extreme abuse of the
concept of a GUC, as well as being a solution in search of a problem,
as well as being something that should absolutely not ever happen inside
the postmaster process for both reliability and security reasons.
I vote a big no on this.

regards, tom lane

#8David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#7)
Re: First feature patch for plperl - draft [PATCH]

On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:

Are multiline GUCs allowed in the postgresql.conf file?

I don't think so. In any case this seems like an extreme abuse of the
concept of a GUC, as well as being a solution in search of a problem,
as well as being something that should absolutely not ever happen inside
the postmaster process for both reliability and security reasons.
I vote a big no on this.

That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it PGInit or something, and then just make the GUC:

plperl.on_perl_init = 'use PGInit;'

Best,

David

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: First feature patch for plperl - draft [PATCH]

Tom Lane wrote:

Jeff <threshar@threshar.is-a-geek.com> writes:

Is there any possible way to enable "use strict;" for plperl (trusted)
modules?

The plperl manual shows a way to do it using some weird syntax or
other. It'd sure be nice to be able to use the regular syntax though.

As is documented, all you have to do is have:

custom_variable_classes = 'plperl'
plperl.use_strict = 'true'

in your config. You only need to put the documented BEGIN block in your
function body if you want to do use strict mode on a case by case basis.

We can't allow an unrestricted "use strict;" in plperl functions because
it invokes an operation (require) that Safe.pm rightly regards as unsafe.

cheers

andrew

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#8)
Re: First feature patch for plperl - draft [PATCH]

"David E. Wheeler" <david@kineticode.com> writes:

On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:

I vote a big no on this.

That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it PGInit or something, and then just make the GUC:

plperl.on_perl_init = 'use PGInit;'

No, you missed the point: I'm objecting to having any such thing as
plperl.on_perl_init, full stop.

Aside from the points I already made, it's not even well defined.
What is to happen if the admin changes the value when the system
is already up?

regards, tom lane

#11Jeff
threshar@torgo.978.org
In reply to: Andrew Dunstan (#9)
Re: First feature patch for plperl - draft [PATCH]

On Dec 4, 2009, at 1:44 PM, Andrew Dunstan wrote:

As is documented, all you have to do is have:

custom_variable_classes = 'plperl'
plperl.use_strict = 'true'

in your config. You only need to put the documented BEGIN block in
your function body if you want to do use strict mode on a case by
case basis.

We can't allow an unrestricted "use strict;" in plperl functions
because it invokes an operation (require) that Safe.pm rightly
regards as unsafe.

Yeah, saw that in the manual in the plperl functions & arguments page
(at the bottom).
I think my confusion came up because I'd read the trust/untrusted
thing which removes the ability to use use/require.

Maybe a blurb or moving that chunk of doc to the trusted/untrusted
page might make that tidbit easier to find?

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: First feature patch for plperl - draft [PATCH]

On Fri, Dec 4, 2009 at 1:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David E. Wheeler" <david@kineticode.com> writes:

On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:

I vote a big no on this.

That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it PGInit or something, and then just make the GUC:

    plperl.on_perl_init = 'use PGInit;'

No, you missed the point: I'm objecting to having any such thing as
plperl.on_perl_init, full stop.

Aside from the points I already made, it's not even well defined.
What is to happen if the admin changes the value when the system
is already up?

So, do we look for another way to provide the functionality besides
having a GUC, or is the functionality itself bad?

...Robert

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: First feature patch for plperl - draft [PATCH]

Robert Haas <robertmhaas@gmail.com> writes:

So, do we look for another way to provide the functionality besides
having a GUC, or is the functionality itself bad?

I don't think we want random Perl code running inside the postmaster,
no matter what the API to cause it is. I might hold my nose for "on
load" code if it can only run in backends, though I still say that
it's a badly designed concept because of the uncertainty about who
will run what when. Shlib load time is not an event that ought to be
user-visible.

regards, tom lane

#14David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#10)
Re: First feature patch for plperl - draft [PATCH]

On Dec 4, 2009, at 10:51 AM, Tom Lane wrote:

plperl.on_perl_init = 'use PGInit;'

No, you missed the point: I'm objecting to having any such thing as
plperl.on_perl_init, full stop.

Aside from the points I already made, it's not even well defined.
What is to happen if the admin changes the value when the system
is already up?

Nothing. Hence the "init".

Best,

David

#15David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#13)
Re: First feature patch for plperl - draft [PATCH]

On Dec 4, 2009, at 11:05 AM, Tom Lane wrote:

So, do we look for another way to provide the functionality besides
having a GUC, or is the functionality itself bad?

I don't think we want random Perl code running inside the postmaster,
no matter what the API to cause it is. I might hold my nose for "on
load" code if it can only run in backends, though I still say that
it's a badly designed concept because of the uncertainty about who
will run what when. Shlib load time is not an event that ought to be
user-visible.

So only the child processes would be allowed to load the code? That could make connections even slower if there's a lot of Perl code to be added, though that's also the issue we have today. I guess I could live with that, though I'd rather have such code shared across processes.

If it's a badly designed concept, do you have any ideas that are less bad?

Best,

David

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: First feature patch for plperl - draft [PATCH]

On Fri, Dec 4, 2009 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

So, do we look for another way to provide the functionality besides
having a GUC, or is the functionality itself bad?

I don't think we want random Perl code running inside the postmaster,
no matter what the API to cause it is.  I might hold my nose for "on
load" code if it can only run in backends, though I still say that
it's a badly designed concept because of the uncertainty about who
will run what when.  Shlib load time is not an event that ought to be
user-visible.

I agree that the uncertainty is not a wonderful thing, but e.g. Apache
has the same problem with mod_perl, and you just deal with it. I
choose to deal with it by doing "apachectl graceful" every time I
change the source code; or you can install Perl modules that check
whether the mod-times on the other modules you've loaded have changed
and reload them if so. In practice, being able to pre-load the Perl
libraries you're going to want to execute is absolutely essential if
you don't want performance to be in the toilet. My code base is so
large now that it takes 3 or 4 seconds for Apache to pull it all in on
my crappy dev box, but it's blazingly fast once it's up and running.
Having that be something that happens on the production server only
once a week or once a month when I roll out a new release rather than
any more frequently is really important.

...Robert

#17Tim Bunce
Tim.Bunce@pobox.com
In reply to: Tom Lane (#5)
Re: First feature patch for plperl - draft [PATCH]

On Fri, Dec 04, 2009 at 11:01:42AM -0500, Tom Lane wrote:

Jeff <threshar@threshar.is-a-geek.com> writes:

Is there any possible way to enable "use strict;" for plperl (trusted)
modules?

The plperl manual shows a way to do it using some weird syntax or
other. It'd sure be nice to be able to use the regular syntax though.

Finding a solution is definitely on my list. I've spent a little time
exploring this already but haven't found a simple solution yet.

The neatest would have been overriding &CORE::GLOBAL::require but sadly
the Safe/Opcode mechanism takes priority over that and forbids compiling
code that does a use/require.

I may end up re-enabling the require opcode but redirecting it to run
some C code in plperl.c (the same 'opcode redirection' technique used by
my NYTProf profiler). That C code would only need to throw an exception
if the module hasn't been loaded already.

Tim.

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David E. Wheeler (#15)
Re: First feature patch for plperl - draft [PATCH]

David E. Wheeler escribi�:

If it's a badly designed concept, do you have any ideas that are less bad?

I'm not sure that we want to duplicate this idea today, but in pltcl
there's a pltcl_modules table that is scanned on interpreter init and
loads user-defined code.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#19Tim Bunce
Tim.Bunce@pobox.com
In reply to: Tom Lane (#13)
Re: First feature patch for plperl - draft [PATCH]

On Fri, Dec 04, 2009 at 02:05:28PM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

So, do we look for another way to provide the functionality besides
having a GUC, or is the functionality itself bad?

I don't think we want random Perl code running inside the postmaster,
no matter what the API to cause it is. I might hold my nose for "on
load" code if it can only run in backends, though I still say that
it's a badly designed concept because of the uncertainty about who
will run what when.

Robert's comparison with mod_perl is very apt. Preloading code gives
dramatic performance gains in production situations where there's a
significant codebase and connections are frequent.

The docs for plperl.on_perl_init could include a section relating to
it's use with shared_preload_libraries. That could document any issues
and caveats you feel are important.

Tim.

#20Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tim Bunce (#19)
Re: First feature patch for plperl - draft [PATCH]

Le 4 déc. 2009 à 20:40, Tim Bunce a écrit :

Robert's comparison with mod_perl is very apt. Preloading code gives
dramatic performance gains in production situations where there's a
significant codebase and connections are frequent.

How far do you go with using a connection pooler such as pgbouncer?

--
dim

#21David E. Wheeler
david@kineticode.com
In reply to: Tim Bunce (#19)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#13)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#21)
#24Tim Bunce
Tim.Bunce@pobox.com
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tim Bunce (#24)
#26Andrew Dunstan
andrew@dunslane.net
In reply to: Tim Bunce (#24)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#26)
#28Tim Bunce
Tim.Bunce@pobox.com
In reply to: Tom Lane (#25)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#23)