Lexical Structure - String Constants

Started by Sérgio Saquetimalmost 12 years ago3 messagesdocs
Jump to latest
#1Sérgio Saquetim
sergiosaquetim@gmail.com

Hi,

I'm trying to build in Java a SQL lexer/parser, compliant with PostgreSQL
9.3, from scratch as a hobby project and reading chapter 4, section 4.1 (
http://www.postgresql.org/docs/9.3/interactive/sql-syntax-lexical.html) and
I've noticed a few things I thought I should mention:

In section 4.1.2.1, the following text introduces us to SQL's bizarre
multiline/multisegment split style: "Two string constants that are only
separated by whitespace with at least one newline are concatenated and
effectively treated as if the string had been written as one constant."

The text does not mention if comments are allowed between segments, so I've
run a few tests on PSQL (PostgreSQL 9.3.4):

version

------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-16ubuntu6) 4.8.2, 64-bit
(1 row)

postgres=# SELECT 'a'
'b';
?column?
----------
ab
(1 row)

postgres=# SELECT 'a' --comment
'b';
?column?
----------
ab
(1 row)

So far everything worked, but I've got different results with C style block
comments:

postgres=# SELECT 'a' /*comment*/
'b';
ERROR: syntax error at or near "'b'"
LINE 2: 'b';

So line style comments (--) are accepted between segments but not C style
block comments (/* */). Do you think this difference in behavior should me
mentioned in the docs?

I've also noticed that in section 4.1.2.6, the following statement: "At
least one digit must follow the exponent marker (e), if one is present."

As I've understood the statement, I think it says that the following
instruction should not be valid because the exponent marker is not followed
by at least one digit, but the expression is successfully evaluated:

postgres=# SELECT 10e;
e
----
10
(1 row)

That said, I live in Brazil and English is not my first language so I may
be mistaken, but I thought I should bring this to this list.

Regards,

Sérgio Saquetim

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sérgio Saquetim (#1)
Re: Lexical Structure - String Constants

=?UTF-8?Q?S=C3=A9rgio_Saquetim?= <sergiosaquetim@gmail.com> writes:

So line style comments (--) are accepted between segments but not C style
block comments (/* */). Do you think this difference in behavior should me
mentioned in the docs?

Hm, interesting. It looks to me like modern versions of the SQL spec
require either -- or /* ... */ style comments to be allowed between
segments of a quoted literal. This is pretty bad taste in language
design, if you ask me, but that's what it seems to say. I think that
our current lexer rules date from before the SQL standard even had
/* ... */ style comments, which is why the lexer isn't taking it.

I've also noticed that in section 4.1.2.6, the following statement: "At
least one digit must follow the exponent marker (e), if one is present."

As I've understood the statement, I think it says that the following
instruction should not be valid because the exponent marker is not followed
by at least one digit, but the expression is successfully evaluated:

postgres=# SELECT 10e;
e
----
10
(1 row)

"10e" is not a valid number, just like the manual says. But "10" is a
valid number, and "e" is a valid column alias, so this is equivalent
to "SELECT 10 AS e". There's no requirement for white space between
adjacent tokens, if the tokens couldn't validly be run together into
one token.

regards, tom lane

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#3Sérgio Saquetim
sergiosaquetim@gmail.com
In reply to: Tom Lane (#2)
Re: Lexical Structure - String Constants

"10e" is not a valid number, just like the manual says. But "10" is a
valid number, and "e" is a valid column alias, so this is equivalent
to "SELECT 10 AS e". There's no requirement for white space between
adjacent tokens, if the tokens couldn't validly be run together into
one token.

Thanks Tom,

I haven't noticed that fact. I'll refactor my lexer to deal with that.

Regards,

Sérgio Saquetim