Call for objections: revision of keyword classification
Since we've already seen two complaints about "timestamp" no longer
being an allowed column name in 7.2, I think it's probably time to
make a serious effort at trimming the reserved-word list a little.
The attached patch de-reserves all these former ColLabels:
ABORT unrestricted
BIT can be ColId, but not function name
CHAR can be ColId, but not function name
CHARACTER can be ColId, but not function name
CLUSTER unrestricted
COPY unrestricted
DEC can be ColId, but not function name
DECIMAL can be ColId, but not function name
EXPLAIN unrestricted
FLOAT can be ColId, but not function name
GLOBAL unrestricted
INOUT unrestricted
INTERVAL can be ColId, but not function name
LISTEN unrestricted
LOAD unrestricted
LOCAL unrestricted
LOCK unrestricted
MOVE unrestricted
NCHAR can be ColId, but not function name
NUMERIC can be ColId, but not function name
OUT unrestricted
PRECISION unrestricted
RESET unrestricted
SETOF can be ColId, but not type or function name
SHOW unrestricted
TIME can be ColId, but not function name
TIMESTAMP can be ColId, but not function name
TRANSACTION unrestricted
UNKNOWN unrestricted
VACUUM unrestricted
VARCHAR can be ColId, but not function name
The ones that are now unrestricted were just low-hanging fruit (ie,
they probably should never have been in ColLabel in the first place).
The rest were fixed by recognizing that just because something couldn't
be a function name didn't mean it couldn't be used as a table or column
name. This solves the fundamental shift/reduce conflict posed by cases
like "SELECT TIMESTAMP(3 ...", without also preventing people from
continuing to name their columns "timestamp".
The keyword classification now looks like:
TypeFuncId: IDENT plus all fully-unrestricted keywords
ColId: TypeFuncId plus type-name keywords that might be
followed by '('; these can't be allowed to be
function names, but they can be column names.
func_name: TypeFuncId plus a few special-case ColLabels
(this list could probably be extended further)
ColLabel: ColId plus everything else
Comments? I'd like to apply this, unless there are objections.
I suppose Peter might complain about having to redo the keyword
tables ;-)
regards, tom lane
Seems fine to apply. It would be nice if we had a more general system
for adding keywords and having them be column label/function name
capable. Right now I know I need to add to keyword.c but I have no idea
if/when I need to add to the keyword list in gram.y. Can we move the
keywords out into another file and somehow pull them into gram.y with
the proper attributes so they get into all the places they need to be
with little fiddling?
---------------------------------------------------------------------------
Since we've already seen two complaints about "timestamp" no longer
being an allowed column name in 7.2, I think it's probably time to
make a serious effort at trimming the reserved-word list a little.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
It would be nice if we had a more general system
for adding keywords and having them be column label/function name
capable. Right now I know I need to add to keyword.c but I have no idea
if/when I need to add to the keyword list in gram.y.
*Every* new keyword should be in one of the keyword lists in gram.y.
I tried to clean up the documentation of which list does which and why
in the proposed patch --- what do you think of it?
regards, tom lane
Since we've already seen two complaints about "timestamp" no longer
being an allowed column name in 7.2, I think it's probably time to
make a serious effort at trimming the reserved-word list a little.
Cool.
The only reservation I have (pun not *really* intended ;) is that the
SQL9x reserved words may continue to impact us into the future, so
freeing them up now may just postpone the pain until later. That
probably is not a good enough argument (*I* don't even like it) but any
extra flexibility we put in now is not guaranteed to last forever...
In either case, having reserved words which are also reserved in the SQL
standard will not keep folks from using PostgreSQL, and allowing them
will not be a difference maker in adoption either imho.
- Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes:
The only reservation I have (pun not *really* intended ;) is that the
SQL9x reserved words may continue to impact us into the future, so
freeing them up now may just postpone the pain until later. That
probably is not a good enough argument (*I* don't even like it) but any
extra flexibility we put in now is not guaranteed to last forever...
Of course not, but we might as well do what we can while we can.
One positive point is that (I think) we are pretty close to SQL9x now
on datatype declaration syntax, so if we can make these words unreserved
or less-reserved today, it's not unreasonable to think they might be
able to stay that way indefinitely.
In either case, having reserved words which are also reserved in the SQL
standard will not keep folks from using PostgreSQL, and allowing them
will not be a difference maker in adoption either imho.
No, it won't. I'm mainly doing this to try to minimize the pain of
people porting forward from previous Postgres releases, in which
(some of) these words weren't reserved. That seems a worthwhile
goal to me, even if in the long run they end up absorbing the pain
anyway. Certain pain now vs maybe-or-maybe-not pain later is an
easy tradeoff ;-)
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Can we move the keywords out into another file and somehow pull them
into gram.y with the proper attributes so they get into all the places
they need to be with little fiddling?
Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:
AS Hard-reserved
CASE ColLabel
ABSOLUTE TypeFuncId
BIT ColId
and make some scripts that generate both keyword.c and the list
productions in gram.y automatically. (Among other things, we could stop
trusting manual sorting of the keyword.c entries ...) Peter's
documentation generator would no doubt be a lot happier too --- we
could add indications of SQL92 and SQL99 reserved status to this
master file, for example.
However, right offhand I don't see any equivalent of #include in the
Bison manual, so I'm not sure how the autogenerated list productions
could be included into the hand-maintained part of gram.y. Thoughts?
regards, tom lane
PS: no, I'm *not* suggesting we do this during beta.
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Can we move the keywords out into another file and somehow pull them
into gram.y with the proper attributes so they get into all the places
they need to be with little fiddling?Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:AS Hard-reserved
CASE ColLabel
ABSOLUTE TypeFuncId
BIT ColIdand make some scripts that generate both keyword.c and the list
productions in gram.y automatically. (Among other things, we could stop
trusting manual sorting of the keyword.c entries ...) Peter's
documentation generator would no doubt be a lot happier too --- we
could add indications of SQL92 and SQL99 reserved status to this
master file, for example.However, right offhand I don't see any equivalent of #include in the
Bison manual, so I'm not sure how the autogenerated list productions
could be included into the hand-maintained part of gram.y. Thoughts?
Yes, this is what I was suggesting; a central file that can be pulled
in to generate the others.
Doesn't bison deal with #include? I guess not. The only other way is
to make a gram.y.pre, and have Makefile do the inclusions in the proper
spot, and run that new gram.y through bison. The fact is, you have to
process the central file anyway so may as well just do the gram.y
replacements manually too.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Doesn't bison deal with #include? I guess not. The only other way is
to make a gram.y.pre, and have Makefile do the inclusions in the proper
spot, and run that new gram.y through bison.
I was hoping to avoid that sort of kluge ... surely the bison designers
thought of include, and I'm just not seeing how it's done ...
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Doesn't bison deal with #include? I guess not. The only other way is
to make a gram.y.pre, and have Makefile do the inclusions in the proper
spot, and run that new gram.y through bison.I was hoping to avoid that sort of kluge ... surely the bison designers
thought of include, and I'm just not seeing how it's done ...
What does #include do? Doesn't it work?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
What does #include do? Doesn't it work?
AFAICT it's only allowed in the C-code sections of gram.y, from which
it's just transposed into the output .c file (as indeed you'd want;
you wouldn't want your header files expanded when bison is run).
I'm not seeing anything that supports inclusion of a file in the
grammar-productions portion of gram.y.
regards, tom lane
How do some of the other RDBMSs handle this? I've gotten into the habit
awhile ago of not using 'field types' as 'field names' that not using
something like 'timestamp' as a field name comes naturally ... ignoring
going from old-PgSQL to new-PgSQL ... what about PgSQL->Oracle? I
personally like it when I see apps out there that strive to work with
different DBs, I'd hate to see it be us that makes life more difficult for
ppl to make choices because we 'softened restrictions' on reserved words,
allowing someone to create an app that works great under us, but is now a
headache to change to someone else's RDBMSs as a result ...
... if that makes any sense?
On Thu, 8 Nov 2001, Tom Lane wrote:
Show quoted text
Thomas Lockhart <lockhart@fourpalms.org> writes:
The only reservation I have (pun not *really* intended ;) is that the
SQL9x reserved words may continue to impact us into the future, so
freeing them up now may just postpone the pain until later. That
probably is not a good enough argument (*I* don't even like it) but any
extra flexibility we put in now is not guaranteed to last forever...Of course not, but we might as well do what we can while we can.
One positive point is that (I think) we are pretty close to SQL9x now
on datatype declaration syntax, so if we can make these words unreserved
or less-reserved today, it's not unreasonable to think they might be
able to stay that way indefinitely.In either case, having reserved words which are also reserved in the SQL
standard will not keep folks from using PostgreSQL, and allowing them
will not be a difference maker in adoption either imho.No, it won't. I'm mainly doing this to try to minimize the pain of
people porting forward from previous Postgres releases, in which
(some of) these words weren't reserved. That seems a worthwhile
goal to me, even if in the long run they end up absorbing the pain
anyway. Certain pain now vs maybe-or-maybe-not pain later is an
easy tradeoff ;-)regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
Bruce Momjian <pgman@candle.pha.pa.us> writes:
What does #include do? Doesn't it work?
AFAICT it's only allowed in the C-code sections of gram.y, from which
it's just transposed into the output .c file (as indeed you'd want;
you wouldn't want your header files expanded when bison is run).I'm not seeing anything that supports inclusion of a file in the
grammar-productions portion of gram.y.
It would be very easy to simulate the #include in the action section
using a small awk script. I can assist.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
"Marc G. Fournier" <scrappy@hub.org> writes:
I'd hate to see it be us that makes life more difficult for
ppl to make choices because we 'softened restrictions' on reserved words,
allowing someone to create an app that works great under us, but is now a
headache to change to someone else's RDBMSs as a result ...
Well, I could see making a "strict SQL" mode that rejects *all* PG-isms,
but in the absence of such a thing I don't see much value to taking a
hard line just on the point of disallowing keywords as field names.
That seems unlikely to be anyone's worst porting headache ...
Your question is valid though: do other RDBMSs take a hard line on
how reserved keywords are? I dunno.
regards, tom lane
...
Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:
...
and make some scripts that generate both keyword.c and the list
productions in gram.y automatically. (Among other things, we could stop
trusting manual sorting of the keyword.c entries ...) Peter's
documentation generator would no doubt be a lot happier too --- we
could add indications of SQL92 and SQL99 reserved status to this
master file, for example.
istm that we would have a better time using gram.y as the definitive
source for this list. Trying to stuff gram.y from some other source file
moves the information another step away from bison, which is the
definitive arbiter of correct behavior and syntax. Complaints that
things are too hard to figure out won't get better by having more
indirection in the process, and no matter how we do it one will still
need to understand the relationships between tokens and productions.
We could have a perl script (haven't looked; maybe Peter's utility
already does this?) which rummages through gram.y and generates
keyword.c. And if we wanted to categorize what we implement wrt SQL9x
definitions, we should do a join from lists in SQL9x against our
keywords, rather than trying to maintain that relationship manually. We
could even find some database to do it for us ;)
- Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes:
Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:
istm that we would have a better time using gram.y as the definitive
source for this list.
That's what we're doing now, more or less, and it's got glaring
deficiencies. It's nearly unintelligible (cf Bruce's complaint
earlier in this thread) and it's horribly prone to human error.
Here are just three depressingly-easy-to-make mistakes against
which we have no mechanical check:
* keyword production mismatches token and action, eg
| FOO { $$ = "bar"; }
* failure to add new keyword to any of the appropriate lists;
* messing up the perfect sort order required in keyword.c.
What's worse is that the consequences of these mistakes are relatively
subtle and could escape detection for awhile. I'd like to see mistakes
of this kind become procedurally impossible.
We could have a perl script (haven't looked; maybe Peter's utility
already does this?) which rummages through gram.y and generates
keyword.c.
I believe Peter's already doing some form of this, but gram.y is a
forbiddingly unfriendly form of storage for this information. It'd
be a lot easier and less mistake-prone to start from a *designed*
keyword database and generate the appropriate lists in gram.y.
BTW, another thing in the back of my mind is that we should try to
figure out some way to unify ecpg's SQL grammar with the backend's.
Maintaining that thing is an even bigger headache than getting the
backend's own parser right.
regards, tom lane
That's what we're doing now, more or less, and it's got glaring
deficiencies. It's nearly unintelligible (cf Bruce's complaint
earlier in this thread) and it's horribly prone to human error.
Here are just three depressingly-easy-to-make mistakes against
which we have no mechanical check:
Zounds! How could this ever have worked??!! ;)
What's worse is that the consequences of these mistakes are relatively
subtle and could escape detection for awhile. I'd like to see mistakes
of this kind become procedurally impossible.
No disagreement with the goals...
I believe Peter's already doing some form of this, but gram.y is a
forbiddingly unfriendly form of storage for this information. It'd
be a lot easier and less mistake-prone to start from a *designed*
keyword database and generate the appropriate lists in gram.y.
Certainly gram.y is forbidding to beginners and those who don't spend
much time in the code, but separating blocks of the code into external
files only increases the indirection. One still has to *understand* what
gram.y is doing, and no amount of reorganization will keep one from the
possibility of shift/reduce problems with new productions.
One possibility would be to put better comments into gram.y, and to back
those comments up with a validation script that *could* generate
keyword.c and other cross references. A bit more structure to the
comments and code would enable that I think.
BTW, another thing in the back of my mind is that we should try to
figure out some way to unify ecpg's SQL grammar with the backend's.
Maintaining that thing is an even bigger headache than getting the
backend's own parser right.
That would be nice. Unfortunately that would lead to the main parser
having the same machinations used in ecpg, with separate subroutine
calls for *every* production. Yuck. I wonder if some other structure
would be possible...
- Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes:
BTW, another thing in the back of my mind is that we should try to
figure out some way to unify ecpg's SQL grammar with the backend's.
Maintaining that thing is an even bigger headache than getting the
backend's own parser right.
That would be nice. Unfortunately that would lead to the main parser
having the same machinations used in ecpg, with separate subroutine
calls for *every* production. Yuck.
The thing is that most of the actions in ecpg's grammar could easily be
generated mechanically. My half-baked idea here is some sort of script
that would take the backend grammar, strip out the backend's actions and
replace 'em with mechanically-generated actions that reconstruct the
query string, and finally merge with a small set of hand-maintained
rules that reflect ecpg's distinctive features.
You're quite right that nothing like this will reduce the amount that
maintainers have to know. But I think it could reduce the amount of
tedious, purely mechanical, and error-prone maintenance work that we
have to do to keep various files and lists in sync.
regards, tom lane
One possibility would be to put better comments into gram.y, and to back
those comments up with a validation script that *could* generate
keyword.c and other cross references. A bit more structure to the
comments and code would enable that I think.
A validation script is a good intermediate idea, similar to our
duplicate_oids we have in include/catalog. It would make sure
keywords.c was sorted, and make sure each keyword appeared somewhere in
lists of allowed function/column name productions.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
A validation script is a good intermediate idea,
IMHO a validation script would be *far* harder than the alternative
I'm proposing, because it'd have to parse and interpret gram.y and
keyword.c. Building a correct-by-construction set of keyword lists
seems much easier than checking their rather messy representation
in those files.
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
A validation script is a good intermediate idea,
IMHO a validation script would be *far* harder than the alternative
I'm proposing, because it'd have to parse and interpret gram.y and
keyword.c. Building a correct-by-construction set of keyword lists
seems much easier than checking their rather messy representation
in those files.
Agreed. It just removed the indirection problem mentioned by Thomas.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane writes:
The keyword classification now looks like:
TypeFuncId: IDENT plus all fully-unrestricted keywords
ColId: TypeFuncId plus type-name keywords that might be
followed by '('; these can't be allowed to be
function names, but they can be column names.func_name: TypeFuncId plus a few special-case ColLabels
(this list could probably be extended further)ColLabel: ColId plus everything else
Comments? I'd like to apply this, unless there are objections.
Is there any reason why ColLabel does not include func_name? All the
tokens listed in func_name are also part of ColLabel.
I suppose Peter might complain about having to redo the keyword
tables ;-)
The question is, do we want to give the user that much detail, or should
we just say
TypeFuncId, ColId -> "non-reserved"
func_name, ColLabel -> "reserved" (along with the explanations in the
text)
The plain reserved/non-reserved scheme makes it easier to match up the
PostgreSQL column with the SQL9x columns, and hopefully less users will
nitpick about whatever details or consider the categories a promise for
all future.
--
Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes:
Is there any reason why ColLabel does not include func_name? All the
tokens listed in func_name are also part of ColLabel.
Can't do it directly (ie, make func_name one of the alternatives for
ColLabel) because that would result in a ton of shift-reduce conflicts:
all the keywords in TypeFuncId would have two ways to be reduced to
ColLabel (via ColId or via func_name). We could restructure things
by adding an auxiliary category:
func_name: TypeFuncId | func_name_keywords;
func_name_keywords: BETWEEN | BINARY | ... ;
ColLabel: ColId | func_name_keywords | ALL | ANALYSE | ... ;
but I'm not convinced that that's materially cleaner. Comments?
The question is, do we want to give the user that much detail, or should
we just say
TypeFuncId, ColId -> "non-reserved"
func_name, ColLabel -> "reserved" (along with the explanations in the
text)
ColId is certainly the most important category for ordinary users, so
I agree that division would be sufficient for most people's purposes.
However ... seems like the point of having this documentation at all
is for it to be complete and accurate. I'd vote for telling the whole
truth, I think.
regards, tom lane
BTW, is there any good reason that AS is not a member of the ColLabel
set? It wouldn't cause a parse conflict to add it (I just tested that)
and it seems like that's a special case we could do without.
regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes:
I found that COALESCE, EXISTS, EXTRACT, NULLIF, POSITION, SUBSTRING, TRIM
can be moved from ColLabel to ColId.
Really? I didn't bother experimenting with anything that had special
a_expr productions ... but now that you mention it, it makes sense that
anything whose special meaning requires a following left paren could
work as a ColId.
Are you recommending that we actually make this change now, or leave
it for a future round of experiments?
regards, tom lane
Import Notes
Reply to msg id not found: Pine.LNX.4.30.0111151631170.633-100000@peter.localdomainReference msg id not found: Pine.LNX.4.30.0111151631170.633-100000@peter.localdomain | Resolved by subject fallback
Tom Lane writes:
ColId is certainly the most important category for ordinary users, so
I agree that division would be sufficient for most people's purposes.
However ... seems like the point of having this documentation at all
is for it to be complete and accurate. I'd vote for telling the whole
truth, I think.
Okay, here's the new definition of truth then:
TypeFuncId => "non-reserved"
ColId => "non-reserved (cannot be function or type)"
func_name => "reserved (can be function)"
ColId => "reserved"
This can still be matched well against the SQL 9x columns.
But it gets worse... ;-)
I found that COALESCE, EXISTS, EXTRACT, NULLIF, POSITION, SUBSTRING, TRIM
can be moved from ColLabel to ColId. (This makes sense given the new
definition of ColId as above.) However, I *think* it should be possible
to use these tokens as type names if one were willing to refactor these
lists further. So there's possibly plenty of fun left in this area. ;-)
--
Peter Eisentraut peter_e@gmx.net
Tom Lane writes:
BTW, is there any good reason that AS is not a member of the ColLabel
set? It wouldn't cause a parse conflict to add it (I just tested that)
and it seems like that's a special case we could do without.
Fine with me. I guess I'll wait with the new table a bit yet. ;-)
--
Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes:
Fine with me. I guess I'll wait with the new table a bit yet. ;-)
The coast is clear now ...
I divided the keywords into four mutually exclusive, all inclusive
category lists:
unreserved_keyword
col_name_keyword
func_name_keyword
reserved_keyword
which I trust will not be a problem for your document-generating
script.
ecpg has some finer distinctions but I'm happy to leave those
undocumented.
regards, tom lane