trailing junk in numeric literals

Started by Peter Eisentrautabout 5 years ago10 messages
#1Peter Eisentraut
peter.eisentraut@enterprisedb.com

I was surprised to find that this doesn't error:

=> select 100a;
a
-----
100

I suspect this and similar cases used to error before aliases without AS
were introduced. But now this seems possibly problematic. Should we
try to handle this better?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: trailing junk in numeric literals

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

I was surprised to find that this doesn't error:
=> select 100a;
a
-----
100

I suspect this and similar cases used to error before aliases without AS
were introduced. But now this seems possibly problematic. Should we
try to handle this better?

Meh. I think you'd get more brickbats than kudos if you start insisting
on a space there.

I'm too lazy to try to decipher the SQL spec right now, but ISTR that
it insists on whitespace between a numeric literal and an identifier.
So strictly speaking this SQL code is nonstandard anyway. But our
lexer has always been forgiving about not requiring space if it's
not logically necessary to separate tokens. I doubt trying to
change that would improve matters.

regards, tom lane

#3Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Tom Lane (#2)
Re: trailing junk in numeric literals

On 2020-12-28 21:54, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

I was surprised to find that this doesn't error:
=> select 100a;
a
-----
100

I suspect this and similar cases used to error before aliases without AS
were introduced. But now this seems possibly problematic. Should we
try to handle this better?

Meh. I think you'd get more brickbats than kudos if you start insisting
on a space there.

I'm too lazy to try to decipher the SQL spec right now, but ISTR that
it insists on whitespace between a numeric literal and an identifier.

Yeah, non-delimiter tokens are supposed to be separated by delimiter tokens.

So strictly speaking this SQL code is nonstandard anyway. But our
lexer has always been forgiving about not requiring space if it's
not logically necessary to separate tokens. I doubt trying to
change that would improve matters.

Well, the idea is to diagnose potential typos better. But if there is
no interest, then that's fine.

#4Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Peter Eisentraut (#3)
Re: trailing junk in numeric literals

Hello Peter,

My 0.02€:

So strictly speaking this SQL code is nonstandard anyway. But our
lexer has always been forgiving about not requiring space if it's
not logically necessary to separate tokens. I doubt trying to
change that would improve matters.

Well, the idea is to diagnose potential typos better. But if there is no
interest, then that's fine.

ISTM that silently accepting bogus syntax hides bugs rather than helps
users. I'm personaly all for fixing these, especially when I'm said user.

My latest catch was:

SELECT TIMESTAMP '2020-12-29Z06:16:18'; # 2020-12-29 00:00:00

But:

SELECT TIMESTAMPTZ '2020-12-29Z06:16:18'; # 2020-12-29 07:16:18+01
SELECT TIMESTAMP '2020-12-29T06:16:18'; # 2020-12-29 06:16:18

I happen to type a O which is close to 0 for which the shift key is also
needed on the French keyboard. This makes the unhelpful:

SELECT 12O; # 12 as O

I think that the policy should be to help user by detecting mistyped
entries, not trying to interpret them out of the norm and expectations.

--
Fabien.

#5Vik Fearing
vik@postgresfriends.org
In reply to: Peter Eisentraut (#3)
Re: trailing junk in numeric literals

On 12/29/20 10:18 AM, Peter Eisentraut wrote:

On 2020-12-28 21:54, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

I was surprised to find that this doesn't error:
=> select 100a;
    a
-----
   100

I suspect this and similar cases used to error before aliases without AS
were introduced.  But now this seems possibly problematic.  Should we
try to handle this better?

Meh.  I think you'd get more brickbats than kudos if you start insisting
on a space there.

I'm too lazy to try to decipher the SQL spec right now, but ISTR that
it insists on whitespace between a numeric literal and an identifier.

Yeah, non-delimiter tokens are supposed to be separated by delimiter
tokens.

So strictly speaking this SQL code is nonstandard anyway.  But our
lexer has always been forgiving about not requiring space if it's
not logically necessary to separate tokens.  I doubt trying to
change that would improve matters.

Well, the idea is to diagnose potential typos better.  But if there is
no interest, then that's fine.

I am in favor of such a change so that we can also accept 1_000_000
which currently parses as "1 AS _000_000" (which also isn't compliant
because identifiers cannot start with an underscore, but I don't want to
take it that far).

It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110
without most of it being interpreted as an alias.
--
Vik Fearing

#6Andreas Karlsson
andreas@proxel.se
In reply to: Vik Fearing (#5)
Re: trailing junk in numeric literals

On 1/16/21 2:02 PM, Vik Fearing wrote:

I am in favor of such a change so that we can also accept 1_000_000
which currently parses as "1 AS _000_000" (which also isn't compliant
because identifiers cannot start with an underscore, but I don't want to
take it that far).

It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110
without most of it being interpreted as an alias.

That would be a nice feature. Is it part of the SQL standard?

Andreas

#7Vik Fearing
vik@postgresfriends.org
In reply to: Andreas Karlsson (#6)
Re: trailing junk in numeric literals

On 1/16/21 4:32 PM, Andreas Karlsson wrote:

On 1/16/21 2:02 PM, Vik Fearing wrote:

I am in favor of such a change so that we can also accept 1_000_000
which currently parses as "1 AS _000_000" (which also isn't compliant
because identifiers cannot start with an underscore, but I don't want to
take it that far).

It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110
without most of it being interpreted as an alias.

That would be a nice feature. Is it part of the SQL standard?

Yes, all of that is in the standard.
--
Vik Fearing

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#7)
Re: trailing junk in numeric literals

Vik Fearing <vik@postgresfriends.org> writes:

On 1/16/21 4:32 PM, Andreas Karlsson wrote:

On 1/16/21 2:02 PM, Vik Fearing wrote:

I am in favor of such a change so that we can also accept 1_000_000
which currently parses as "1 AS _000_000" (which also isn't compliant
because identifiers cannot start with an underscore, but I don't want to
take it that far).
It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110
without most of it being interpreted as an alias.

That would be a nice feature. Is it part of the SQL standard?

Yes, all of that is in the standard.

Really? Please cite chapter and verse. AFAICS in SQL:2011 5.3 <literal>,
a numeric literal can't contain any extraneous characters, just sign,
digits, optional decimal point, and optional exponent. Hex and octal
literals are certainly not there either.

regards, tom lane

#9Vik Fearing
vik@postgresfriends.org
In reply to: Tom Lane (#8)
Re: trailing junk in numeric literals

On 1/16/21 6:10 PM, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

On 1/16/21 4:32 PM, Andreas Karlsson wrote:

On 1/16/21 2:02 PM, Vik Fearing wrote:

I am in favor of such a change so that we can also accept 1_000_000
which currently parses as "1 AS _000_000" (which also isn't compliant
because identifiers cannot start with an underscore, but I don't want to
take it that far).
It would also allow us to have 0xdead_beef, 0o_777, and 0b1010_0000_1110
without most of it being interpreted as an alias.

That would be a nice feature. Is it part of the SQL standard?

Yes, all of that is in the standard.

Really? Please cite chapter and verse. AFAICS in SQL:2011 5.3 <literal>,
a numeric literal can't contain any extraneous characters, just sign,
digits, optional decimal point, and optional exponent. Hex and octal
literals are certainly not there either.

With respect, you are looking at a 10-year-old document and I am not.

5.3 <literal> has since been modified.
--
Vik Fearing

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#9)
Re: trailing junk in numeric literals

Vik Fearing <vik@postgresfriends.org> writes:

With respect, you are looking at a 10-year-old document and I am not.
5.3 <literal> has since been modified.

Is a newer version of the spec available online?

regards, tom lane