Call for objections: revision of keyword classification

Started by Tom Laneover 24 years ago27 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Since we've already seen two complaints about "timestamp" no longer
being an allowed column name in 7.2, I think it's probably time to
make a serious effort at trimming the reserved-word list a little.

The attached patch de-reserves all these former ColLabels:

ABORT unrestricted
BIT can be ColId, but not function name
CHAR can be ColId, but not function name
CHARACTER can be ColId, but not function name
CLUSTER unrestricted
COPY unrestricted
DEC can be ColId, but not function name
DECIMAL can be ColId, but not function name
EXPLAIN unrestricted
FLOAT can be ColId, but not function name
GLOBAL unrestricted
INOUT unrestricted
INTERVAL can be ColId, but not function name
LISTEN unrestricted
LOAD unrestricted
LOCAL unrestricted
LOCK unrestricted
MOVE unrestricted
NCHAR can be ColId, but not function name
NUMERIC can be ColId, but not function name
OUT unrestricted
PRECISION unrestricted
RESET unrestricted
SETOF can be ColId, but not type or function name
SHOW unrestricted
TIME can be ColId, but not function name
TIMESTAMP can be ColId, but not function name
TRANSACTION unrestricted
UNKNOWN unrestricted
VACUUM unrestricted
VARCHAR can be ColId, but not function name

The ones that are now unrestricted were just low-hanging fruit (ie,
they probably should never have been in ColLabel in the first place).
The rest were fixed by recognizing that just because something couldn't
be a function name didn't mean it couldn't be used as a table or column
name. This solves the fundamental shift/reduce conflict posed by cases
like "SELECT TIMESTAMP(3 ...", without also preventing people from
continuing to name their columns "timestamp".

The keyword classification now looks like:

TypeFuncId: IDENT plus all fully-unrestricted keywords

ColId: TypeFuncId plus type-name keywords that might be
followed by '('; these can't be allowed to be
function names, but they can be column names.

func_name: TypeFuncId plus a few special-case ColLabels
(this list could probably be extended further)

ColLabel: ColId plus everything else

Comments? I'd like to apply this, unless there are objections.
I suppose Peter might complain about having to redo the keyword
tables ;-)

regards, tom lane

#2Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Call for objections: revision of keyword classification

Seems fine to apply. It would be nice if we had a more general system
for adding keywords and having them be column label/function name
capable. Right now I know I need to add to keyword.c but I have no idea
if/when I need to add to the keyword list in gram.y. Can we move the
keywords out into another file and somehow pull them into gram.y with
the proper attributes so they get into all the places they need to be
with little fiddling?

---------------------------------------------------------------------------

Since we've already seen two complaints about "timestamp" no longer
being an allowed column name in 7.2, I think it's probably time to
make a serious effort at trimming the reserved-word list a little.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

It would be nice if we had a more general system
for adding keywords and having them be column label/function name
capable. Right now I know I need to add to keyword.c but I have no idea
if/when I need to add to the keyword list in gram.y.

*Every* new keyword should be in one of the keyword lists in gram.y.

I tried to clean up the documentation of which list does which and why
in the proposed patch --- what do you think of it?

regards, tom lane

#4Thomas Lockhart
lockhart@fourpalms.org
In reply to: Tom Lane (#1)
Re: Call for objections: revision of keyword classification

Since we've already seen two complaints about "timestamp" no longer
being an allowed column name in 7.2, I think it's probably time to
make a serious effort at trimming the reserved-word list a little.

Cool.

The only reservation I have (pun not *really* intended ;) is that the
SQL9x reserved words may continue to impact us into the future, so
freeing them up now may just postpone the pain until later. That
probably is not a good enough argument (*I* don't even like it) but any
extra flexibility we put in now is not guaranteed to last forever...

In either case, having reserved words which are also reserved in the SQL
standard will not keep folks from using PostgreSQL, and allowing them
will not be a difference maker in adoption either imho.

- Thomas

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#4)
Re: Call for objections: revision of keyword classification

Thomas Lockhart <lockhart@fourpalms.org> writes:

The only reservation I have (pun not *really* intended ;) is that the
SQL9x reserved words may continue to impact us into the future, so
freeing them up now may just postpone the pain until later. That
probably is not a good enough argument (*I* don't even like it) but any
extra flexibility we put in now is not guaranteed to last forever...

Of course not, but we might as well do what we can while we can.

One positive point is that (I think) we are pretty close to SQL9x now
on datatype declaration syntax, so if we can make these words unreserved
or less-reserved today, it's not unreasonable to think they might be
able to stay that way indefinitely.

In either case, having reserved words which are also reserved in the SQL
standard will not keep folks from using PostgreSQL, and allowing them
will not be a difference maker in adoption either imho.

No, it won't. I'm mainly doing this to try to minimize the pain of
people porting forward from previous Postgres releases, in which
(some of) these words weren't reserved. That seems a worthwhile
goal to me, even if in the long run they end up absorbing the pain
anyway. Certain pain now vs maybe-or-maybe-not pain later is an
easy tradeoff ;-)

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Can we move the keywords out into another file and somehow pull them
into gram.y with the proper attributes so they get into all the places
they need to be with little fiddling?

Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:

AS Hard-reserved
CASE ColLabel
ABSOLUTE TypeFuncId
BIT ColId

and make some scripts that generate both keyword.c and the list
productions in gram.y automatically. (Among other things, we could stop
trusting manual sorting of the keyword.c entries ...) Peter's
documentation generator would no doubt be a lot happier too --- we
could add indications of SQL92 and SQL99 reserved status to this
master file, for example.

However, right offhand I don't see any equivalent of #include in the
Bison manual, so I'm not sure how the autogenerated list productions
could be included into the hand-maintained part of gram.y. Thoughts?

regards, tom lane

PS: no, I'm *not* suggesting we do this during beta.

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Can we move the keywords out into another file and somehow pull them
into gram.y with the proper attributes so they get into all the places
they need to be with little fiddling?

Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:

AS Hard-reserved
CASE ColLabel
ABSOLUTE TypeFuncId
BIT ColId

and make some scripts that generate both keyword.c and the list
productions in gram.y automatically. (Among other things, we could stop
trusting manual sorting of the keyword.c entries ...) Peter's
documentation generator would no doubt be a lot happier too --- we
could add indications of SQL92 and SQL99 reserved status to this
master file, for example.

However, right offhand I don't see any equivalent of #include in the
Bison manual, so I'm not sure how the autogenerated list productions
could be included into the hand-maintained part of gram.y. Thoughts?

Yes, this is what I was suggesting; a central file that can be pulled
in to generate the others.

Doesn't bison deal with #include? I guess not. The only other way is
to make a gram.y.pre, and have Makefile do the inclusions in the proper
spot, and run that new gram.y through bison. The fact is, you have to
process the central file anyway so may as well just do the gram.y
replacements manually too.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Doesn't bison deal with #include? I guess not. The only other way is
to make a gram.y.pre, and have Makefile do the inclusions in the proper
spot, and run that new gram.y through bison.

I was hoping to avoid that sort of kluge ... surely the bison designers
thought of include, and I'm just not seeing how it's done ...

regards, tom lane

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Doesn't bison deal with #include? I guess not. The only other way is
to make a gram.y.pre, and have Makefile do the inclusions in the proper
spot, and run that new gram.y through bison.

I was hoping to avoid that sort of kluge ... surely the bison designers
thought of include, and I'm just not seeing how it's done ...

What does #include do? Doesn't it work?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

What does #include do? Doesn't it work?

AFAICT it's only allowed in the C-code sections of gram.y, from which
it's just transposed into the output .c file (as indeed you'd want;
you wouldn't want your header files expanded when bison is run).

I'm not seeing anything that supports inclusion of a file in the
grammar-productions portion of gram.y.

regards, tom lane

#11The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#5)
Re: Call for objections: revision of keyword classification

How do some of the other RDBMSs handle this? I've gotten into the habit
awhile ago of not using 'field types' as 'field names' that not using
something like 'timestamp' as a field name comes naturally ... ignoring
going from old-PgSQL to new-PgSQL ... what about PgSQL->Oracle? I
personally like it when I see apps out there that strive to work with
different DBs, I'd hate to see it be us that makes life more difficult for
ppl to make choices because we 'softened restrictions' on reserved words,
allowing someone to create an app that works great under us, but is now a
headache to change to someone else's RDBMSs as a result ...

... if that makes any sense?

On Thu, 8 Nov 2001, Tom Lane wrote:

Show quoted text

Thomas Lockhart <lockhart@fourpalms.org> writes:

The only reservation I have (pun not *really* intended ;) is that the
SQL9x reserved words may continue to impact us into the future, so
freeing them up now may just postpone the pain until later. That
probably is not a good enough argument (*I* don't even like it) but any
extra flexibility we put in now is not guaranteed to last forever...

Of course not, but we might as well do what we can while we can.

One positive point is that (I think) we are pretty close to SQL9x now
on datatype declaration syntax, so if we can make these words unreserved
or less-reserved today, it's not unreasonable to think they might be
able to stay that way indefinitely.

In either case, having reserved words which are also reserved in the SQL
standard will not keep folks from using PostgreSQL, and allowing them
will not be a difference maker in adoption either imho.

No, it won't. I'm mainly doing this to try to minimize the pain of
people porting forward from previous Postgres releases, in which
(some of) these words weren't reserved. That seems a worthwhile
goal to me, even if in the long run they end up absorbing the pain
anyway. Certain pain now vs maybe-or-maybe-not pain later is an
easy tradeoff ;-)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

What does #include do? Doesn't it work?

AFAICT it's only allowed in the C-code sections of gram.y, from which
it's just transposed into the output .c file (as indeed you'd want;
you wouldn't want your header files expanded when bison is run).

I'm not seeing anything that supports inclusion of a file in the
grammar-productions portion of gram.y.

It would be very easy to simulate the #include in the action section
using a small awk script. I can assist.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#11)
Re: Call for objections: revision of keyword classification

"Marc G. Fournier" <scrappy@hub.org> writes:

I'd hate to see it be us that makes life more difficult for
ppl to make choices because we 'softened restrictions' on reserved words,
allowing someone to create an app that works great under us, but is now a
headache to change to someone else's RDBMSs as a result ...

Well, I could see making a "strict SQL" mode that rejects *all* PG-isms,
but in the absence of such a thing I don't see much value to taking a
hard line just on the point of disallowing keywords as field names.
That seems unlikely to be anyone's worst porting headache ...

Your question is valid though: do other RDBMSs take a hard line on
how reserved keywords are? I dunno.

regards, tom lane

#14Thomas Lockhart
lockhart@fourpalms.org
In reply to: Bruce Momjian (#2)
Re: Call for objections: revision of keyword classification

...

Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:

...

and make some scripts that generate both keyword.c and the list
productions in gram.y automatically. (Among other things, we could stop
trusting manual sorting of the keyword.c entries ...) Peter's
documentation generator would no doubt be a lot happier too --- we
could add indications of SQL92 and SQL99 reserved status to this
master file, for example.

istm that we would have a better time using gram.y as the definitive
source for this list. Trying to stuff gram.y from some other source file
moves the information another step away from bison, which is the
definitive arbiter of correct behavior and syntax. Complaints that
things are too hard to figure out won't get better by having more
indirection in the process, and no matter how we do it one will still
need to understand the relationships between tokens and productions.

We could have a perl script (haven't looked; maybe Peter's utility
already does this?) which rummages through gram.y and generates
keyword.c. And if we wanted to categorize what we implement wrt SQL9x
definitions, we should do a join from lists in SQL9x against our
keywords, rather than trying to maintain that relationship manually. We
could even find some database to do it for us ;)

- Thomas

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#14)
Re: Call for objections: revision of keyword classification

Thomas Lockhart <lockhart@fourpalms.org> writes:

Thinking about that, it seems like it might be nice to have a master
keyword file that contains just keywords and classifications:

istm that we would have a better time using gram.y as the definitive
source for this list.

That's what we're doing now, more or less, and it's got glaring
deficiencies. It's nearly unintelligible (cf Bruce's complaint
earlier in this thread) and it's horribly prone to human error.
Here are just three depressingly-easy-to-make mistakes against
which we have no mechanical check:

* keyword production mismatches token and action, eg

| FOO { $$ = "bar"; }

* failure to add new keyword to any of the appropriate lists;

* messing up the perfect sort order required in keyword.c.

What's worse is that the consequences of these mistakes are relatively
subtle and could escape detection for awhile. I'd like to see mistakes
of this kind become procedurally impossible.

We could have a perl script (haven't looked; maybe Peter's utility
already does this?) which rummages through gram.y and generates
keyword.c.

I believe Peter's already doing some form of this, but gram.y is a
forbiddingly unfriendly form of storage for this information. It'd
be a lot easier and less mistake-prone to start from a *designed*
keyword database and generate the appropriate lists in gram.y.

BTW, another thing in the back of my mind is that we should try to
figure out some way to unify ecpg's SQL grammar with the backend's.
Maintaining that thing is an even bigger headache than getting the
backend's own parser right.

regards, tom lane

#16Thomas Lockhart
lockhart@fourpalms.org
In reply to: Bruce Momjian (#2)
Re: Call for objections: revision of keyword classification

That's what we're doing now, more or less, and it's got glaring
deficiencies. It's nearly unintelligible (cf Bruce's complaint
earlier in this thread) and it's horribly prone to human error.
Here are just three depressingly-easy-to-make mistakes against
which we have no mechanical check:

Zounds! How could this ever have worked??!! ;)

What's worse is that the consequences of these mistakes are relatively
subtle and could escape detection for awhile. I'd like to see mistakes
of this kind become procedurally impossible.

No disagreement with the goals...

I believe Peter's already doing some form of this, but gram.y is a
forbiddingly unfriendly form of storage for this information. It'd
be a lot easier and less mistake-prone to start from a *designed*
keyword database and generate the appropriate lists in gram.y.

Certainly gram.y is forbidding to beginners and those who don't spend
much time in the code, but separating blocks of the code into external
files only increases the indirection. One still has to *understand* what
gram.y is doing, and no amount of reorganization will keep one from the
possibility of shift/reduce problems with new productions.

One possibility would be to put better comments into gram.y, and to back
those comments up with a validation script that *could* generate
keyword.c and other cross references. A bit more structure to the
comments and code would enable that I think.

BTW, another thing in the back of my mind is that we should try to
figure out some way to unify ecpg's SQL grammar with the backend's.
Maintaining that thing is an even bigger headache than getting the
backend's own parser right.

That would be nice. Unfortunately that would lead to the main parser
having the same machinations used in ecpg, with separate subroutine
calls for *every* production. Yuck. I wonder if some other structure
would be possible...

- Thomas

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#16)
Re: Call for objections: revision of keyword classification

Thomas Lockhart <lockhart@fourpalms.org> writes:

BTW, another thing in the back of my mind is that we should try to
figure out some way to unify ecpg's SQL grammar with the backend's.
Maintaining that thing is an even bigger headache than getting the
backend's own parser right.

That would be nice. Unfortunately that would lead to the main parser
having the same machinations used in ecpg, with separate subroutine
calls for *every* production. Yuck.

The thing is that most of the actions in ecpg's grammar could easily be
generated mechanically. My half-baked idea here is some sort of script
that would take the backend grammar, strip out the backend's actions and
replace 'em with mechanically-generated actions that reconstruct the
query string, and finally merge with a small set of hand-maintained
rules that reflect ecpg's distinctive features.

You're quite right that nothing like this will reduce the amount that
maintainers have to know. But I think it could reduce the amount of
tedious, purely mechanical, and error-prone maintenance work that we
have to do to keep various files and lists in sync.

regards, tom lane

#18Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#16)
Re: Call for objections: revision of keyword classification

One possibility would be to put better comments into gram.y, and to back
those comments up with a validation script that *could* generate
keyword.c and other cross references. A bit more structure to the
comments and code would enable that I think.

A validation script is a good intermediate idea, similar to our
duplicate_oids we have in include/catalog. It would make sure
keywords.c was sorted, and make sure each keyword appeared somewhere in
lists of allowed function/column name productions.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#18)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

A validation script is a good intermediate idea,

IMHO a validation script would be *far* harder than the alternative
I'm proposing, because it'd have to parse and interpret gram.y and
keyword.c. Building a correct-by-construction set of keyword lists
seems much easier than checking their rather messy representation
in those files.

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#19)
Re: Call for objections: revision of keyword classification

Bruce Momjian <pgman@candle.pha.pa.us> writes:

A validation script is a good intermediate idea,

IMHO a validation script would be *far* harder than the alternative
I'm proposing, because it'd have to parse and interpret gram.y and
keyword.c. Building a correct-by-construction set of keyword lists
seems much easier than checking their rather messy representation
in those files.

Agreed. It just removed the indirection problem mentioned by Thomas.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#21Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#1)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#21)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#23)
#25Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#22)
#26Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#23)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#26)