pg_proc.probin should become text?

Started by Tom Laneover 16 years ago12 messages
#1Tom Lane
tgl@sss.pgh.pa.us

pg_proc.probin is currently declared as being type bytea. Perhaps that
had some real utility once upon a time, but no currently defined
procedural language stores binary data there. In fact, the only
thing that gets stored there is the shared-library file name for
C functions. To make things even more fun, none of the code touching
probin is doing anything to honor the special escaping conventions for
bytea. This was pretty broken already, and might perhaps explain some
of the weirder error reports we've gotten. It is now obvious to me
that no shlib file name containing non-ASCII characters will correctly
survive a dump/reload, in any existing PG release. Backslashes
accidentally fail to fail, at least for some values of
standard_conforming_strings.

However, with the hex bytea output patch in place, it's *completely*
broken. I offer the following pg_dump output:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
LANGUAGE c
AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c', 'interpt_pp';

which should of course have looked like this:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
LANGUAGE c
AS '/home/postgres/testversion/src/test/regress/regress.sl', 'interpt_pp';

I think that the least painful solution might be to change
pg_proc.probin to be declared as text. Otherwise we're going to need
version-specific klugery in pg_dump and who knows where else.

Comments?

regards, tom lane

#2Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#1)
Re: pg_proc.probin should become text?

On Mon, Aug 3, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

pg_proc.probin is currently declared as being type bytea.  Perhaps that
had some real utility once upon a time, but no currently defined
procedural language stores binary data there.  In fact, the only
thing that gets stored there is the shared-library file name for
C functions.  To make things even more fun, none of the code touching
probin is doing anything to honor the special escaping conventions for
bytea.  This was pretty broken already, and might perhaps explain some
of the weirder error reports we've gotten.  It is now obvious to me
that no shlib file name containing non-ASCII characters will correctly
survive a dump/reload, in any existing PG release.  Backslashes
accidentally fail to fail, at least for some values of
standard_conforming_strings.

However, with the hex bytea output patch in place, it's *completely*
broken.  I offer the following pg_dump output:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
   LANGUAGE c
   AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c', 'interpt_pp';

which should of course have looked like this:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
   LANGUAGE c
   AS '/home/postgres/testversion/src/test/regress/regress.sl', 'interpt_pp';

I think that the least painful solution might be to change
pg_proc.probin to be declared as text.  Otherwise we're going to need
version-specific klugery in pg_dump and who knows where else.

Comments?

Will that require a special hack in pg_migrator?

...Robert

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: pg_proc.probin should become text?

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Aug 3, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

I think that the least painful solution might be to change
pg_proc.probin to be declared as text. �Otherwise we're going to need
version-specific klugery in pg_dump and who knows where else.

Will that require a special hack in pg_migrator?

No, pg_dump (and hence pg_migrator) wouldn't care.

If we thought we were going to try to back-patch a fix for the
non-ASCII-char problem into older releases, then some other approach
would be preferable. But given the fact that this hasn't gotten
complained of (at least not enough to be identified) in all the years
since Berkeley, I can't see doing the pushups that would be needed to
back-patch a fix. It looks to me like everyone has been effectively
assuming that probin stores text, and we should just make it really
truly do that.

regards, tom lane

#4David Fetter
david@fetter.org
In reply to: Tom Lane (#1)
Re: pg_proc.probin should become text?

On Mon, Aug 03, 2009 at 10:03:11PM -0400, Tom Lane wrote:

However, with the hex bytea output patch in place, it's *completely*
broken. I offer the following pg_dump output:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
LANGUAGE c
AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c', 'interpt_pp';

which should of course have looked like this:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
LANGUAGE c
AS '/home/postgres/testversion/src/test/regress/regress.sl', 'interpt_pp';

Oh, of course! How could I have been so dense? ;)

I think that the least painful solution might be to change
pg_proc.probin to be declared as text. Otherwise we're going to need
version-specific klugery in pg_dump and who knows where else.

Comments?

+1 on changing it to text.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: pg_proc.probin should become text?

On Mon, Aug 3, 2009 at 10:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Aug 3, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

I think that the least painful solution might be to change
pg_proc.probin to be declared as text.  Otherwise we're going to need
version-specific klugery in pg_dump and who knows where else.

Will that require a special hack in pg_migrator?

No, pg_dump (and hence pg_migrator) wouldn't care.

If we thought we were going to try to back-patch a fix for the
non-ASCII-char problem into older releases, then some other approach
would be preferable.  But given the fact that this hasn't gotten
complained of (at least not enough to be identified) in all the years
since Berkeley, I can't see doing the pushups that would be needed to
back-patch a fix.  It looks to me like everyone has been effectively
assuming that probin stores text, and we should just make it really
truly do that.

Sounds good to me, then.

...Robert

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#1)
Re: pg_proc.probin should become text?

2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:

pg_proc.probin is currently declared as being type bytea.  Perhaps that
had some real utility once upon a time, but no currently defined
procedural language stores binary data there.  In fact, the only
thing that gets stored there is the shared-library file name for
C functions.  To make things even more fun, none of the code touching
probin is doing anything to honor the special escaping conventions for
bytea.  This was pretty broken already, and might perhaps explain some
of the weirder error reports we've gotten.  It is now obvious to me
that no shlib file name containing non-ASCII characters will correctly
survive a dump/reload, in any existing PG release.  Backslashes
accidentally fail to fail, at least for some values of
standard_conforming_strings.

However, with the hex bytea output patch in place, it's *completely*
broken.  I offer the following pg_dump output:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
   LANGUAGE c
   AS '\\x2f686f6d652f706f7374677265732f706773716c2f7372632f746573742f726567726573732f726567726573732e736c', 'interpt_pp';

which should of course have looked like this:

CREATE FUNCTION interpt_pp(path, path) RETURNS point
   LANGUAGE c
   AS '/home/postgres/testversion/src/test/regress/regress.sl', 'interpt_pp';

I think that the least painful solution might be to change
pg_proc.probin to be declared as text.  Otherwise we're going to need
version-specific klugery in pg_dump and who knows where else.

Comments?

-1

correct solution is moving the path to prosrc col or create new colum
"externalproc". I thing so probin should be useful for plpgsql -
sometime we should to store parser result from plpgsql compilation
stage in this column. So my option is don't change native sense of
this column. If we actually have not using, the we should to drop it,
not create some breaks in future.

Pavel Stehule

Show quoted text

                       regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#6)
Re: pg_proc.probin should become text?

Pavel Stehule <pavel.stehule@gmail.com> writes:

2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:

I think that the least painful solution might be to change
pg_proc.probin to be declared as text.

correct solution is moving the path to prosrc col or create new colum
"externalproc". I thing so probin should be useful for plpgsql -
sometime we should to store parser result from plpgsql compilation
stage in this column. So my option is don't change native sense of
this column. If we actually have not using, the we should to drop it,
not create some breaks in future.

[ shrug... ] Can't get excited about it. What you propose would be a
significant amount of work and added complexity, because pg_dump or
somebody would have to take care to translate existing function
definitions properly. And the benefit is purely speculative.
I seriously doubt that serializing plpgsql compilation trees into a
bytea representation would be worth anyone's time. The original
Berkeley authors probably had some idea of storing compiled machine code
in probin, but nobody's done that either, and I'll bet long odds that
nobody ever will. (The advantage compared to external C functions
implemented as we currently know them seems negligible. The Berkeley
folk probably did not foresee the prevalence of loadable-shared-library
support, nor the general security-driven move away from allowing
execution of writable memory.)

regards, tom lane

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#7)
Re: pg_proc.probin should become text?

2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

Pavel Stehule <pavel.stehule@gmail.com> writes:

2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:

I think that the least painful solution might be to change
pg_proc.probin to be declared as text.

correct solution is moving the path to prosrc col or create new colum
"externalproc". I thing so probin should be useful for plpgsql -
sometime we should to store parser result from plpgsql compilation
stage in this column. So my option is don't change native sense of
this column. If we actually have not using, the we should to drop it,
not create some breaks in future.

[ shrug... ]  Can't get excited about it.  What you propose would be a
significant amount of work and added complexity, because pg_dump or
somebody would have to take care to translate existing function
definitions properly.  And the benefit is purely speculative.
I seriously doubt that serializing plpgsql compilation trees into a
bytea representation would be worth anyone's time.  The original
Berkeley authors probably had some idea of storing compiled machine code
in probin, but nobody's done that either, and I'll bet long odds that
nobody ever will.  (The advantage compared to external C functions
implemented as we currently know them seems negligible.  The Berkeley
folk probably did not foresee the prevalence of loadable-shared-library
support, nor the general security-driven move away from allowing
execution of writable memory.)

                       regards, tom lane

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#7)
Re: pg_proc.probin should become text?

2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:

I think that the least painful solution might be to change
pg_proc.probin to be declared as text.

correct solution is moving the path to prosrc col or create new colum
"externalproc". I thing so probin should be useful for plpgsql -
sometime we should to store parser result from plpgsql compilation
stage in this column. So my option is don't change native sense of
this column. If we actually have not using, the we should to drop it,
not create some breaks in future.

[ shrug... ]  Can't get excited about it.  What you propose would be a
significant amount of work and added complexity, because pg_dump or
somebody would have to take care to translate existing function
definitions properly.  And the benefit is purely speculative.
I seriously doubt that serializing plpgsql compilation trees into a
bytea representation would be worth anyone's time.  The original
Berkeley authors probably had some idea of storing compiled machine code
in probin, but nobody's done that either, and I'll bet long odds that
nobody ever will.  (The advantage compared to external C functions
implemented as we currently know them seems negligible.  The Berkeley
folk probably did not foresee the prevalence of loadable-shared-library
support, nor the general security-driven move away from allowing
execution of writable memory.)

I agree, so information about patch would be store in text field. But
I am not sure, if your fix isn't too simply. I haven't plan to compile
plpgsql to C or to binary code. But could be interesting link postgres
with some virtual machine like parrot or lua vm, and translate plpgsql
to p code. It's maybe far future.

Early future is integration main SQL parser to plpgsql. I am not sure,
but maybe we will need some persistent cache for store parametrized
sql queries. I though about store it in probin column.

Pavel

Show quoted text

                       regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#9)
Re: pg_proc.probin should become text?

Pavel Stehule <pavel.stehule@gmail.com> writes:

I agree, so information about patch would be store in text field. But
I am not sure, if your fix isn't too simply. I haven't plan to compile
plpgsql to C or to binary code. But could be interesting link postgres
with some virtual machine like parrot or lua vm, and translate plpgsql
to p code. It's maybe far future.

Early future is integration main SQL parser to plpgsql. I am not sure,
but maybe we will need some persistent cache for store parametrized
sql queries. I though about store it in probin column.

Well, probin is the wrong place for any such thing anyhow. Regardless
of datatype issues, the system is clearly built on the assumption that
the value of probin is to be specified *by the user* in CREATE FUNCTION.
If you want a cache for derived information it would need to be a new
column.

(I remain of the opinion that caching such stuff in pg_proc would be
a bad design decision. Every data structure that goes to disk is
another data structure that you cannot easily change. There just isn't
enough gain there to justify the maintenance headaches.)

regards, tom lane

#11Alvaro Herrera
alvherre@commandprompt.com
In reply to: Pavel Stehule (#9)
Re: pg_proc.probin should become text?

Pavel Stehule escribi�:

I agree, so information about patch would be store in text field. But
I am not sure, if your fix isn't too simply. I haven't plan to compile
plpgsql to C or to binary code. But could be interesting link postgres
with some virtual machine like parrot or lua vm, and translate plpgsql
to p code. It's maybe far future.

In this case I think it would make more sense to compile the code and
keep the p-code in backend local memory. Keeping compiled code around
in more permanent or global storage would only make sense if you had
large amounts of code, and in that case I think we'd want to have a more
concrete and complete proposal.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#10)
Re: pg_proc.probin should become text?

2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I agree, so information about patch would be store in text field. But
I am not sure, if your fix isn't too simply. I haven't plan to compile
plpgsql to C or to binary code. But could be interesting link postgres
with some virtual machine like parrot or lua vm, and translate plpgsql
to p code. It's maybe far future.

Early future is integration main SQL parser to plpgsql. I am not sure,
but maybe we will need some persistent cache for store parametrized
sql queries. I though about store it in probin column.

Well, probin is the wrong place for any such thing anyhow.  Regardless
of datatype issues, the system is clearly built on the assumption that
the value of probin is to be specified *by the user* in CREATE FUNCTION.
If you want a cache for derived information it would need to be a new
column.

(I remain of the opinion that caching such stuff in pg_proc would be
a bad design decision.  Every data structure that goes to disk is
another data structure that you cannot easily change.  There just isn't
enough gain there to justify the maintenance headaches.)

ook, I agree - but I am not sure, so some cache will be necessary.

Pavel

Show quoted text

                       regards, tom lane