BUG #13960: plpython fails with certain function names

Started by Jim Nasbyabout 10 years ago10 messagesbugs
Jump to latest
#1Jim Nasby
Jim.Nasby@BlueTreble.com

The following bug has been logged on the website:

Bug reference: 13960
Logged by: Jim Nasby
Email address: Jim.Nasby@BlueTreble.com
PostgreSQL version: 9.5.0
Operating system: OS X
Description:

If a Postgres function contains characters that are illegal for python
identifiers, compilation fails. Error message is not very helpful either:

~@decina.local/21474# CREATE FUNCTION "test[]"() RETURNS text LANGUAGE
plpythonu AS $$return 'test'$$;
ERROR: could not compile PL/Python function "test[]"
DETAIL: SyntaxError: invalid syntax (<string>, line 1)
~@decina.local/21474# CREATE FUNCTION "test"() RETURNS text LANGUAGE
plpythonu AS $$return 'test'$$;
CREATE FUNCTION
~@decina.local/21474#

One possibility is to simply strip out invalid characters[1]https://docs.python.org/2/reference/lexical_analysis.html#grammar-token-identifier identifier ::= (letter|"_") (letter | digit | "_")* letter ::= lowercase | uppercase lowercase ::= "a"..."z" uppercase ::= "A"..."Z" digit ::= "0"..."9". What's valid
expands in 3.5, but I don't think that really matters.

Another possibility would be to catch the python exception, but that's
probably more trouble than it's worth.

[1]: https://docs.python.org/2/reference/lexical_analysis.html#grammar-token-identifier identifier ::= (letter|"_") (letter | digit | "_")* letter ::= lowercase | uppercase lowercase ::= "a"..."z" uppercase ::= "A"..."Z" digit ::= "0"..."9"
https://docs.python.org/2/reference/lexical_analysis.html#grammar-token-identifier
identifier ::= (letter|"_") (letter | digit | "_")*
letter ::= lowercase | uppercase
lowercase ::= "a"..."z"
uppercase ::= "A"..."Z"
digit ::= "0"..."9"

3.5:
https://docs.python.org/3.5/reference/lexical_analysis.html#grammar-token-identifier

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#1)
Re: BUG #13960: plpython fails with certain function names

Jim.Nasby@BlueTreble.com writes:

If a Postgres function contains characters that are illegal for python
identifiers, compilation fails. Error message is not very helpful either:

Hm, how much do we really care? The example seems kinda artificial.

One possibility is to simply strip out invalid characters[1].

No, because then you would get collisions, ie function names that look
different to PG would look the same to python. Bad news.

(Actually, don't we have that issue anyway because of schemas? I wonder
why we are exposing the PG name of the function to python at all.)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#2)
Re: BUG #13960: plpython fails with certain function names

On 2/14/16 6:49 PM, Tom Lane wrote:

Jim.Nasby@BlueTreble.com writes:

If a Postgres function contains characters that are illegal for python
identifiers, compilation fails. Error message is not very helpful either:

Hm, how much do we really care? The example seems kinda artificial.

I did run across it creating cast functions programmatically, roughly as

format( 'CREATE FUNCTION "ndarray_to_%1s"', input_type::regtype )

which blows up when you feed it something like float. The actual example
was doing something similar but for an array.

I agree it's not exactly a high priority thing to support, but the error
text is completely useless unless you happen to know that plpython is
creating an actual python function.

One possibility is to simply strip out invalid characters[1].

No, because then you would get collisions, ie function names that look
different to PG would look the same to python. Bad news.

IIRC we append the regproc OID to ensure it's unique.

(Actually, don't we have that issue anyway because of schemas? I wonder
why we are exposing the PG name of the function to python at all.)

Not sure. Maybe useful in a call stack inside python? I can't really
think of what else you'd use it for. (I assume you're suggesting just
call the function names something like plpython_<regproc::oid>.)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#3)
Re: BUG #13960: plpython fails with certain function names

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

On 2/14/16 6:49 PM, Tom Lane wrote:

No, because then you would get collisions, ie function names that look
different to PG would look the same to python. Bad news.

IIRC we append the regproc OID to ensure it's unique.

Ah, okay, problem solved.

(Actually, don't we have that issue anyway because of schemas? I wonder
why we are exposing the PG name of the function to python at all.)

Not sure. Maybe useful in a call stack inside python? I can't really
think of what else you'd use it for. (I assume you're suggesting just
call the function names something like plpython_<regproc::oid>.)

Yeah, that's what I was thinking about. But yes, if we append the OID
anyway then we might as well just strip all non-alphanumeric chars
from the name. Safe and you still get some debuggability.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#4)
Re: BUG #13960: plpython fails with certain function names

On 2/14/16 7:09 PM, Tom Lane wrote:

Yeah, that's what I was thinking about. But yes, if we append the OID
anyway then we might as well just strip all non-alphanumeric chars
from the name. Safe and you still get some debuggability.

Attached. I opted not to modify the name in-place. If it's safe to
modify in place, might want to just replace invalid characters with '_'
instead of making a copy.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachments:

plpython_name.patchtext/plain; charset=UTF-8; name=plpython_name.patch; x-mac-creator=0; x-mac-type=0Download+23-6
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#5)
Re: BUG #13960: plpython fails with certain function names

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

On 2/14/16 7:09 PM, Tom Lane wrote:

Yeah, that's what I was thinking about. But yes, if we append the OID
anyway then we might as well just strip all non-alphanumeric chars
from the name. Safe and you still get some debuggability.

Attached. I opted not to modify the name in-place. If it's safe to
modify in place, might want to just replace invalid characters with '_'
instead of making a copy.

I like the idea of replacing invalid characters with '_'. It's definitely
not safe to scribble on the pg_proc tuple, but we could get the same
result with a few wasted cycles by rescanning the procName string after
building it, as per attached.

regards, tom lane

Attachments:

plpython_name_2.patchtext/x-diff; charset=us-ascii; name=plpython_name_2.patchDownload+24-14
#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#6)
Re: BUG #13960: plpython fails with certain function names

On 2/16/16 7:49 PM, Tom Lane wrote:

I like the idea of replacing invalid characters with '_'. It's definitely
not safe to scribble on the pg_proc tuple, but we could get the same
result with a few wasted cycles by rescanning the procName string after
building it, as per attached.

Heck, I didn't even think about that. Yeah, it's going to scan another
20 bytes or so, but this certainly isn't performance critical.

BTW, I didn't bother checking this with python 3.5, but I can't fathom
how that would matter here.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#7)
Re: BUG #13960: plpython fails with certain function names

Jim Nasby <Jim.Nasby@bluetreble.com> writes:

BTW, I didn't bother checking this with python 3.5, but I can't fathom
how that would matter here.

Me either, but the buildfarm will soon tell us if it does.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Pedro Gimeno
parigalo@formauri.es
In reply to: Tom Lane (#6)
Re: BUG #13960: plpython fails with certain function names

Tom Lane wrote, On 2016-02-17 02:49:

+ 	/* Replace any not-legal-in-Python-names characters with '_' */
+ 	for (ptr = procName; *ptr; ptr++)
+ 	{
+ 		if (!((*ptr >= 'A' && *ptr <= 'Z') ||
+ 			  (*ptr >= 'a' && *ptr <= 'z') ||
+ 			  (*ptr >= '0' && *ptr <= '9')))
+ 			*ptr = '_';
+ 	}
+ 

That may blow on names that start with a digit. Maybe

+ (*ptr >= '0' && *ptr <= '9' && ptr != procName)))

Making the testcase start with a digit sounds like a good idea.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pedro Gimeno (#9)
Re: BUG #13960: plpython fails with certain function names

On 2/17/16 8:36 PM, Pedro Gimeno wrote:

Tom Lane wrote, On 2016-02-17 02:49:

+ 	/* Replace any not-legal-in-Python-names characters with '_' */
+ 	for (ptr = procName; *ptr; ptr++)
+ 	{
+ 		if (!((*ptr >= 'A' && *ptr <= 'Z') ||
+ 			  (*ptr >= 'a' && *ptr <= 'z') ||
+ 			  (*ptr >= '0' && *ptr <= '9')))
+ 			*ptr = '_';
+ 	}
+

That may blow on names that start with a digit. Maybe

+ (*ptr >= '0' && *ptr <= '9' && ptr != procName)))

Making the testcase start with a digit sounds like a good idea.

Postgres prefaces the name of the python function with a fixed string,
so it will never start with a digit.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs