Controlling changes in plpgsql variable resolution
As most of you will recall, plpgsql currently acts as though identifiers
in SQL queries should be resolved first as plpgsql variable names, and
only failing that do they get processed as names of the query. The
plpgsql parser rewrite that I'm working on will fix that for the
obviously-silly cases where a plpgsql variable is substituted for a
table name or some other non-scalar-variable identifier. However, what
should we do when a name could represent either a plpgsql variable
or a column of the query? Historically we've resolved it as the
plpgsql variable, but we've sure heard a lot of complaints about that.
Oracle's PL/SQL has the precedence the other way around: resolve first
as the query column, and only failing that as a PL variable. The Oracle
behavior is arguably less surprising because the query-provided names
belong to the nearer enclosing scope. I believe that we ought to move
to the Oracle behavior over time, but how do we get there from here?
Changing it is almost surely going to break a lot of people's functions,
and in rather subtle ways.
I think there are basically three behaviors that we could offer:
1. Resolve ambiguous names as plpgsql (historical PG behavior)
2. Resolve ambiguous names as query column (Oracle behavior)
3. Throw error if name is ambiguous (useful for finding problems)
(Another possibility is to throw a warning but proceed anyway. It would
be easy to do that if we proceed with the Oracle behavior, but *not*
easy if we proceed with the historical PG behavior. The reason is that
the code invoked by transformColumnRef may have already made some
side-effects on the query tree. We discussed the implicit-RTE behavior
yesterday, but there are other effects of a successful name lookup,
such as marking columns for privilege checking.)
What I'm wondering about at the moment is which behaviors to offer and
how to control them. The obvious answer is "use a GUC" but that answer
scares me because of the ease with which switching between #1 and #2
would break plpgsql functions. It's not out of the question that that
could even amount to a security problem. I could see using a GUC to
turn the error behavior (#3) on and off, but not to switch between #1
and #2.
Another possibility is to control it on a per-function basis by adding
some special syntax to plpgsql function bodies to say which behavior
to use. We could for instance extend the never-documented "#option"
syntax. This is pretty ugly and would be inconvenient to use too
--- if people have to go and add "#option something" to a function,
they might as well just fix whatever name conflicts it has instead.
I'm not seeing any choice that seems likely to make everybody happy.
Any comments or ideas?
regards, tom lane
On Sun, 2009-10-18 at 13:25 -0400, Tom Lane wrote:
As most of you will recall, plpgsql currently acts as though identifiers
in SQL queries should be resolved first as plpgsql variable names, and
only failing that do they get processed as names of the query. The
plpgsql parser rewrite that I'm working on will fix that for the
obviously-silly cases where a plpgsql variable is substituted for a
table name or some other non-scalar-variable identifier. However, what
should we do when a name could represent either a plpgsql variable
or a column of the query? Historically we've resolved it as the
plpgsql variable, but we've sure heard a lot of complaints about that.
Oracle's PL/SQL has the precedence the other way around: resolve first
as the query column, and only failing that as a PL variable. The Oracle
behavior is arguably less surprising because the query-provided names
belong to the nearer enclosing scope. I believe that we ought to move
to the Oracle behavior over time, but how do we get there from here?
Changing it is almost surely going to break a lot of people's functions,
and in rather subtle ways.I think there are basically three behaviors that we could offer:
1. Resolve ambiguous names as plpgsql (historical PG behavior)
2. Resolve ambiguous names as query column (Oracle behavior)
3. Throw error if name is ambiguous (useful for finding problems)(Another possibility is to throw a warning but proceed anyway. It would
be easy to do that if we proceed with the Oracle behavior, but *not*
easy if we proceed with the historical PG behavior. The reason is that
the code invoked by transformColumnRef may have already made some
side-effects on the query tree. We discussed the implicit-RTE behavior
yesterday, but there are other effects of a successful name lookup,
such as marking columns for privilege checking.)What I'm wondering about at the moment is which behaviors to offer and
how to control them. The obvious answer is "use a GUC" but that answer
scares me because of the ease with which switching between #1 and #2
would break plpgsql functions. It's not out of the question that that
could even amount to a security problem. I could see using a GUC to
turn the error behavior (#3) on and off, but not to switch between #1
and #2.Another possibility is to control it on a per-function basis by adding some special syntax to plpgsql function bodies to say which behavior to use. We could for instance extend the never-documented "#option" syntax. This is pretty ugly and would be inconvenient to use too --- if people have to go and add "#option something" to a function, they might as well just fix whatever name conflicts it has instead.
I'd suggest two options, one for name resolution (#1 or #2) and one for
error level of ambiguity (none or ERROR).
GUCs are fine, now we have GUC settings per-function.
--
Simon Riggs www.2ndQuadrant.com
On Sun, Oct 18, 2009 at 1:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
As most of you will recall, plpgsql currently acts as though identifiers
in SQL queries should be resolved first as plpgsql variable names, and
only failing that do they get processed as names of the query. The
plpgsql parser rewrite that I'm working on will fix that for the
obviously-silly cases where a plpgsql variable is substituted for a
table name or some other non-scalar-variable identifier. However, what
should we do when a name could represent either a plpgsql variable
or a column of the query? Historically we've resolved it as the
plpgsql variable, but we've sure heard a lot of complaints about that.
Oracle's PL/SQL has the precedence the other way around: resolve first
as the query column, and only failing that as a PL variable. The Oracle
behavior is arguably less surprising because the query-provided names
belong to the nearer enclosing scope. I believe that we ought to move
to the Oracle behavior over time, but how do we get there from here?
Changing it is almost surely going to break a lot of people's functions,
and in rather subtle ways.I think there are basically three behaviors that we could offer:
1. Resolve ambiguous names as plpgsql (historical PG behavior)
2. Resolve ambiguous names as query column (Oracle behavior)
3. Throw error if name is ambiguous (useful for finding problems)(Another possibility is to throw a warning but proceed anyway. It would
be easy to do that if we proceed with the Oracle behavior, but *not*
easy if we proceed with the historical PG behavior. The reason is that
the code invoked by transformColumnRef may have already made some
side-effects on the query tree. We discussed the implicit-RTE behavior
yesterday, but there are other effects of a successful name lookup,
such as marking columns for privilege checking.)What I'm wondering about at the moment is which behaviors to offer and
how to control them. The obvious answer is "use a GUC" but that answer
scares me because of the ease with which switching between #1 and #2
would break plpgsql functions. It's not out of the question that that
could even amount to a security problem. I could see using a GUC to
turn the error behavior (#3) on and off, but not to switch between #1
and #2.Another possibility is to control it on a per-function basis by adding some special syntax to plpgsql function bodies to say which behavior to use. We could for instance extend the never-documented "#option" syntax. This is pretty ugly and would be inconvenient to use too --- if people have to go and add "#option something" to a function, they might as well just fix whatever name conflicts it has instead.I'm not seeing any choice that seems likely to make everybody happy.
Any comments or ideas?
If we just change the default behavior from #1 to #2, it's going to be
insanely easy to dump a database using pg_dump for 8.4, restore into
an 8.5 database, and end up with a function that does something
different and broken. So I'm opposed to that plan, but amenable to
any of the other options in varying degrees.
I think it would make a fair amount of sense to make #3 the default behavior.
If possible, I think we should try to engineer things so that using
pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5
database produces a function with identical semantics.
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
If possible, I think we should try to engineer things so that using
pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5
database produces a function with identical semantics.
Hmm ... actually, we could have pg_dump stick either a #option line
or a GUC SET parameter onto every plpgsql function it pulls from an
old database. So if you're willing to assume that people do their
upgrades that way, it could be made reasonably safe, even if the
default behavior changes.
regards, tom lane
On Sun, Oct 18, 2009 at 4:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
If possible, I think we should try to engineer things so that using
pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5
database produces a function with identical semantics.Hmm ... actually, we could have pg_dump stick either a #option line
or a GUC SET parameter onto every plpgsql function it pulls from an
old database. So if you're willing to assume that people do their
upgrades that way, it could be made reasonably safe, even if the
default behavior changes.
I'm not completely willing to assume that. I know we recommend it,
but I'm sure that people don't always do it. And this is pretty
subtle: someone might screw it up and get a non-working function
without realizing it. The advantage of making the default behavior
"throw an error" is that it should be pretty obvious if you've broken
something.
And this isn't just about pg_dump, either. I have a script that gets
run regularly on one of my production databases that goes an updates
the definitions of a whole bunch of functions to the latest version
from the source code repository. Even if pg_dump DTRT, the next run
of a script of that type might subtly break a bunch of stuff.
The current behavior is a trap for the unwary, so I have no problem
with changing it. But I think we should try really hard to avoid
creating a more subtle trap in the process.
...Robert
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
I think there are basically three behaviors that we could offer:
1. Resolve ambiguous names as plpgsql (historical PG behavior)
2. Resolve ambiguous names as query column (Oracle behavior)
3. Throw error if name is ambiguous (useful for finding problems)
4. Resolve ambiguous names as query column, but throw warning
#4 would be my vote, followed by #3. To be perfectly honest, I'd be a
whole lot happier with a pl/pgsql that let me prefix variable names with
a '$' or similar to get away from this whole nonsense. I've been very
tempted to tell everyone I work with to start prefixing their variables
names with '_' except that it ends up looking just plain ugly.
Thanks,
Stephen
On Mon, Oct 19, 2009 at 10:54 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
I think there are basically three behaviors that we could offer:
1. Resolve ambiguous names as plpgsql (historical PG behavior)
2. Resolve ambiguous names as query column (Oracle behavior)
3. Throw error if name is ambiguous (useful for finding problems)4. Resolve ambiguous names as query column, but throw warning
#4 would be my vote, followed by #3. To be perfectly honest, I'd be a
whole lot happier with a pl/pgsql that let me prefix variable names with
a '$' or similar to get away from this whole nonsense. I've been very
tempted to tell everyone I work with to start prefixing their variables
names with '_' except that it ends up looking just plain ugly.
I think warnings are too easy to miss, but I agree your other
suggestion. I know you can write function_name.variable_name, but
that's often massively long-winded. We either need a short, fixed
prefix, or some kind of sigil. I previously suggested ? to parallel
ECPG, but Tom didn't like it. I still do. :-)
...Robert
On Oct 19, 2009, at 7:54 AM, Stephen Frost wrote:
4. Resolve ambiguous names as query column, but throw warning
#4 would be my vote, followed by #3. To be perfectly honest, I'd be a
whole lot happier with a pl/pgsql that let me prefix variable names
with
a '$' or similar to get away from this whole nonsense. I've been very
tempted to tell everyone I work with to start prefixing their
variables
names with '_' except that it ends up looking just plain ugly.
+1, just what I was thinking.
Best,
David
On Oct 19, 2009, at 8:36 AM, Robert Haas wrote:
I think warnings are too easy to miss, but I agree your other
suggestion. I know you can write function_name.variable_name, but
that's often massively long-winded. We either need a short, fixed
prefix, or some kind of sigil. I previously suggested ? to parallel
ECPG, but Tom didn't like it. I still do. :-)
I suppose that $ would interfere with dollar quoting. What about @ or
@@ (sorry, I did mess with MSSQL back in the 90s).
Hrm…PostgreSQL is starting to have the same problem as Perl: running
out of characters because they're used for operators. :var would be
perfect, if it wasn't for psql. ?var is okay, I guess, if a bit…
questionable. Are {braces} used for anything?
Best,
David
* David E. Wheeler (david@kineticode.com) wrote:
On Oct 19, 2009, at 8:36 AM, Robert Haas wrote:
I think warnings are too easy to miss, but I agree your other
suggestion. I know you can write function_name.variable_name, but
that's often massively long-winded. We either need a short, fixed
prefix, or some kind of sigil. I previously suggested ? to parallel
ECPG, but Tom didn't like it. I still do. :-)I suppose that $ would interfere with dollar quoting. What about @ or @@
(sorry, I did mess with MSSQL back in the 90s).
Uh, what dollar quoting? $_$ is what I typically use, so I wouldn't
expect a $ prefix to cause a problem. I think it'd be more of an issue
because pl/pgsql still uses $1 and whatnot internally (doesn't it?).
Stephen
On Oct 19, 2009, at 9:29 AM, Stephen Frost wrote:
Uh, what dollar quoting? $_$ is what I typically use, so I wouldn't
expect a $ prefix to cause a problem. I think it'd be more of an
issue
because pl/pgsql still uses $1 and whatnot internally (doesn't it?).
Yes, but that's no more an issue than it is in Perl, where the same $n
variables are globals. The issue with dollar quoting is that you can
put anything between the dollar signs. So if you have two $variables,
they can get in the way. Potentially. But perhaps the lexer and/or
Parser won't be confused by that, Tom?
I'd sure love $, as it's like shell, Perl, and other stuff.
Best,
David
"David E. Wheeler" <david@kineticode.com> writes:
I'd sure love $, as it's like shell, Perl, and other stuff.
This discussion has gotten utterly off track. The problem I am trying
to solve is a non-Oracle-compatible behavior in plpgsql. I have got
substantially less than zero interest in proposals that "solve" the
problem by introducing notations that don't even pretend to be
compatible.
regards, tom lane
On Oct 19, 2009, at 9:49 AM, Tom Lane wrote:
I'd sure love $, as it's like shell, Perl, and other stuff.
This discussion has gotten utterly off track. The problem I am trying
to solve is a non-Oracle-compatible behavior in plpgsql. I have got
substantially less than zero interest in proposals that "solve" the
problem by introducing notations that don't even pretend to be
compatible.
Party pooper.
I'd be in favor of a GUC that I could turn on to throw an error when
there's an ambiguity. As for which way it should go, I have no dog in
that pony hunt. Or something.
Best,
David
"David E. Wheeler" <david@kineticode.com> wrote:
I'd be in favor of a GUC that I could turn on to throw an error
when there's an ambiguity.
I would consider hiding one definition with another very bad form, so
I would prefer to have plpgsql throw an error when that happens. I
don't particularly care whether that is the only supported behavior or
whether there's a GUC to control it, or what its default is, if
present.
-Kevin
2009/10/19 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
"David E. Wheeler" <david@kineticode.com> wrote:
I'd be in favor of a GUC that I could turn on to throw an error
when there's an ambiguity.I would consider hiding one definition with another very bad form, so
I would prefer to have plpgsql throw an error when that happens. I
don't particularly care whether that is the only supported behavior or
whether there's a GUC to control it, or what its default is, if
present.
ambiguous identifiers is probably the top reason of some plpgsql's
mysterious errors. More times I found wrong code - sometime really
important (some security checks). I never found good code with
ambiguous identifiers - so for me, exception is good. But - there will
be lot of working applications that contains this hidden bug - and
works "well". So it could be a problem. GUC should be a solution.
Pavel
Show quoted text
-Kevin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 19, 2009 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David E. Wheeler" <david@kineticode.com> writes:
I'd sure love $, as it's like shell, Perl, and other stuff.
This discussion has gotten utterly off track. The problem I am trying
to solve is a non-Oracle-compatible behavior in plpgsql. I have got
substantially less than zero interest in proposals that "solve" the
problem by introducing notations that don't even pretend to be
compatible.
Personally, I'd vote against a GUC option. I just plain don't like the
idea that a function could do different things depending on server
configuration. TBH, I'm not very happy with #option either. That
said, I agree that Oracle method is far better.
Maybe invent a new language handler? plpgsql2 or shorten to pgsql?
Now you can mess around all you want (and maybe fix some other
compatibility warts at the same time).
merlin
On Mon, Oct 19, 2009 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David E. Wheeler" <david@kineticode.com> writes:
I'd sure love $, as it's like shell, Perl, and other stuff.
This discussion has gotten utterly off track. The problem I am trying
to solve is a non-Oracle-compatible behavior in plpgsql. I have got
substantially less than zero interest in proposals that "solve" the
problem by introducing notations that don't even pretend to be
compatible.
OK. In that case, it seems like we should offer options #2 and #3
with a GUC or #option to switch between them. Nobody has made an
argument in favor of keeping #1 around. I'm still strongly of the
opinion that #3 (error) should be the default behavior to avoid silent
failures.
...Robert
Pavel Stehule <pavel.stehule@gmail.com> writes:
ambiguous identifiers is probably the top reason of some plpgsql's
mysterious errors. More times I found wrong code - sometime really
important (some security checks). I never found good code with
ambiguous identifiers - so for me, exception is good. But - there will
be lot of working applications that contains this hidden bug - and
works "well". So it could be a problem. GUC should be a solution.
So the conclusions so far are:
(a) Nobody but me is afraid of the consequences of treating this as
a GUC. (I still think you're all wrong, but so be it.)
(b) Everybody agrees that a "throw error" setting would be helpful.
I am not sure there's any consensus on what the default setting should
be, though. Can we get away with making the default be "throw error"?
What are the probabilities that the OpenACSes of the world will just
set the value to "backward compatible" instead of touching their code?
Do we need/want a hack in pg_dump to attach a SET to functions dumped
from old DBs?
regards, tom lane
On Mon, Oct 19, 2009 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
ambiguous identifiers is probably the top reason of some plpgsql's
mysterious errors. More times I found wrong code - sometime really
important (some security checks). I never found good code with
ambiguous identifiers - so for me, exception is good. But - there will
be lot of working applications that contains this hidden bug - and
works "well". So it could be a problem. GUC should be a solution.So the conclusions so far are:
(a) Nobody but me is afraid of the consequences of treating this as
a GUC. (I still think you're all wrong, but so be it.)
I'm afraid of it, I'm just not sure I have a better idea. It wouldn't
bother me a bit if we made the only available behavior "throw an
error", but I'm afraid it will bother someone else.
Is there a chance we could make this a GUC, but only allow it to be
changed at the function level, with no way to override the server
default? It seems to me that the chances of blowing up the world
would be a lot lower that way, though possibly still not low enough.
(b) Everybody agrees that a "throw error" setting would be helpful.
I am not sure there's any consensus on what the default setting should
be, though. Can we get away with making the default be "throw error"?
What are the probabilities that the OpenACSes of the world will just
set the value to "backward compatible" instead of touching their code?
Do we need/want a hack in pg_dump to attach a SET to functions dumped
from old DBs?
I've already commented on most of these (recap: yes, very high, yes)
so I'll refrain from beating a dead horse.
...Robert
Tom Lane <tgl@sss.pgh.pa.us> wrote:
(a) Nobody but me is afraid of the consequences of treating this as
a GUC.
Well, it seems dangerous to me, but I'm confident we can cover this
within our shop, so I'm reluctant to take a position on it. I guess
the main question is whether we want to allow an Oracle-compatibility
mode, knowing it's a foot-gun. Without it we'd likely make extra work
for someone converting from Oracle to PostgreSQL, although they would
be likely to fix bugs during the cleanup work. Based on previous
decisions I've seen here, I would have expected people to just go with
an error, period; especially since it would simplify the code.
(b) Everybody agrees that a "throw error" setting would be helpful.
That's the only setting I would use on any of our databases, if it
were a GUC.
-Kevin