Non-decimal integer literals

Started by Peter Eisentrautover 4 years ago62 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Those core parts are straightforward enough, but there are a bunch of
other places where integers are parsed, and one could consider in each
case whether they should get the same treatment, for example the
replication syntax lexer, or input function for oid, numeric, and
int2vector. There are also some opportunities to move some code around,
for example scanint8() could be in numutils.c. I have also looked with
some suspicion at some details of the number lexing in ecpg, but haven't
found anything I could break yet. Suggestions are welcome.

Attachments:

v1-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v1-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download+412-56
#2John Naylor
john.naylor@enterprisedb.com
In reply to: Peter Eisentraut (#1)
Re: Non-decimal integer literals

On Mon, Aug 16, 2021 at 5:52 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

The one thing that jumped out at me on a cursory reading is the {integer}
rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

--
John Naylor
EDB: http://www.enterprisedb.com

#3Peter Eisentraut
peter_e@gmx.net
In reply to: John Naylor (#2)
Re: Non-decimal integer literals

On 16.08.21 17:32, John Naylor wrote:

The one thing that jumped out at me on a cursory reading is
the {integer} rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

Agreed, that can be done in a simpler way. Here is an updated patch.

Attachments:

v2-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v2-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download+422-63
#4Zhihong Yu
zyu@yugabyte.com
In reply to: Peter Eisentraut (#3)
Re: Non-decimal integer literals

On Tue, Sep 7, 2021 at 4:13 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 16.08.21 17:32, John Naylor wrote:

The one thing that jumped out at me on a cursory reading is
the {integer} rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

Agreed, that can be done in a simpler way. Here is an updated patch.

Hi,
Minor comment:

+SELECT int4 '0o112';

Maybe involve digits of up to 7 in the octal test case.

Thanks

#5Vik Fearing
vik@postgresfriends.org
In reply to: Peter Eisentraut (#1)
Re: Non-decimal integer literals

On 8/16/21 11:51 AM, Peter Eisentraut wrote:

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

    0x42E
    0o112
    0b100101

per SQL:202x draft.

Is there any hope of adding the optional underscores? I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

(The standard does not allow identifiers to begin with _ but we do...)
--
Vik Fearing

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#5)
Re: Non-decimal integer literals

Vik Fearing <vik@postgresfriends.org> writes:

On 8/16/21 11:51 AM, Peter Eisentraut wrote:

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

    0x42E
    0o112
    0b100101

per SQL:202x draft.

Is there any hope of adding the optional underscores? I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today, e.g.

regression=# select 0x42e;
x42e
------
0
(1 row)

AFAIR we've seen exactly zero field demand for this feature,
so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

regards, tom lane

#7Vik Fearing
vik@postgresfriends.org
In reply to: Tom Lane (#6)
Re: Non-decimal integer literals

On 9/8/21 3:14 PM, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

Is there any hope of adding the optional underscores? I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today,

Ah, true that. So if this does go in, we may as well add the
underscores at the same time.

AFAIR we've seen exactly zero field demand for this feature,

I have often wanted something like this, even if I didn't bring it up on
this list. I have had customers who have wanted this, too. My response
has always been to show these exact problems to explain why it's not
possible, but if it's going to be in the standard then I favor doing it.

I have never really had a use for octal, but sometimes binary and hex
make things much clearer. Having a grouping separator for large numbers
is even more useful.

so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

I don't really see a hurry here. I am fine with waiting until the draft
becomes final.
--
Vik Fearing

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Zhihong Yu (#4)
Re: Non-decimal integer literals

On 07.09.21 13:50, Zhihong Yu wrote:

On 16.08.21 17:32, John Naylor wrote:

The one thing that jumped out at me on a cursory reading is
the {integer} rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token

text to

figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

Agreed, that can be done in a simpler way.  Here is an updated patch.

Hi,
Minor comment:

+SELECT int4 '0o112';

Maybe involve digits of up to 7 in the octal test case.

Good point, here is a lightly updated patch.

Attachments:

v3-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v3-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download+425-66
#9Peter Eisentraut
peter_e@gmx.net
In reply to: Vik Fearing (#7)
Re: Non-decimal integer literals

On 09.09.21 16:08, Vik Fearing wrote:

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today,

Ah, true that. So if this does go in, we may as well add the
underscores at the same time.

Yeah, looks like I'll need to look into the identifier lexing issues
previously discussed. I'll attack that during the next commit fest.

so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

I don't really see a hurry here. I am fine with waiting until the draft
becomes final.

Right, the point is to explore this now so that it can be ready when the
standard is ready.

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#9)
Re: Non-decimal integer literals

On 28.09.21 17:30, Peter Eisentraut wrote:

On 09.09.21 16:08, Vik Fearing wrote:

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid
number-followed-by-identifier
today,

Ah, true that.  So if this does go in, we may as well add the
underscores at the same time.

Yeah, looks like I'll need to look into the identifier lexing issues
previously discussed.  I'll attack that during the next commit fest.

Here is an updated patch for this. It's the previous patch polished a
bit more, and it contains changes so that numeric literals reject
trailing identifier parts without whitespace in between, as discussed.
Maybe I should split that into incremental patches, but for now I only
have the one. I don't have a patch for the underscores in numeric
literals yet. It's in progress, but not ready.

Attachments:

v4-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v4-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download+531-86
#11Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#10)
Re: Non-decimal integer literals

On 01.11.21 07:09, Peter Eisentraut wrote:

Here is an updated patch for this.  It's the previous patch polished a
bit more, and it contains changes so that numeric literals reject
trailing identifier parts without whitespace in between, as discussed.
Maybe I should split that into incremental patches, but for now I only
have the one.  I don't have a patch for the underscores in numeric
literals yet.  It's in progress, but not ready.

Here is a progressed version of this work, split into more incremental
patches. The first three patches are harmless code cleanups. Patch 3
has an interesting naming conflict, noted in the commit message; ideas
welcome. Patches 4 and 5 handle the rejection of trailing junk after
numeric literals, as discussed. I have expanded that compared to the v4
patch to also cover non-integer literals. It also comes with more tests
now. Patch 6 is the titular introduction of non-decimal integer
literals, unchanged from before.

Attachments:

v5-0001-Improve-some-comments-in-scanner-files.patchtext/plain; charset=UTF-8; name=v5-0001-Improve-some-comments-in-scanner-files.patchDownload+25-20
v5-0002-Remove-unused-includes.patchtext/plain; charset=UTF-8; name=v5-0002-Remove-unused-includes.patchDownload+0-5
v5-0003-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v5-0003-Move-scanint8-to-numutils.c.patchDownload+117-123
v5-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v5-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload+69-1
v5-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v5-0005-Reject-trailing-junk-after-numeric-literals.patchDownload+61-52
v5-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v5-0006-Non-decimal-integer-literals.patchDownload+508-89
#12Zhihong Yu
zyu@yugabyte.com
In reply to: Peter Eisentraut (#11)
Re: Non-decimal integer literals

On Thu, Nov 25, 2021 at 5:18 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 01.11.21 07:09, Peter Eisentraut wrote:

Here is an updated patch for this. It's the previous patch polished a
bit more, and it contains changes so that numeric literals reject
trailing identifier parts without whitespace in between, as discussed.
Maybe I should split that into incremental patches, but for now I only
have the one. I don't have a patch for the underscores in numeric
literals yet. It's in progress, but not ready.

Here is a progressed version of this work, split into more incremental
patches. The first three patches are harmless code cleanups. Patch 3
has an interesting naming conflict, noted in the commit message; ideas
welcome. Patches 4 and 5 handle the rejection of trailing junk after
numeric literals, as discussed. I have expanded that compared to the v4
patch to also cover non-integer literals. It also comes with more tests
now. Patch 6 is the titular introduction of non-decimal integer
literals, unchanged from before.

Hi,
For patch 3,

+int64
+pg_strtoint64(const char *s)

How about naming the above function pg_scanint64()?
pg_strtoint64xx() can be named pg_strtoint64() - this would align with
existing function:

pg_strtouint64(const char *str, char **endptr, int base)

Cheers

#13John Naylor
john.naylor@enterprisedb.com
In reply to: Zhihong Yu (#12)
Re: Non-decimal integer literals

Hi Peter,

0001

-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it
separately to
+ * parser, and there it gets coerced via doNegate().

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now, if you
like. I don't have any good ideas about 0003 at the moment.

0005

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1 ({integer}|{decimal})[Ee]
 realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
  }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't match
other failure states:

+SELECT 0b;
+ERROR:  invalid binary integer at or near "SELECT 0b"
+LINE 1: SELECT 0b;
+        ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^

We might consider some tests for ECPG since lack of coverage has been a
problem.

Also, I'm curious: how does the spec work as far as deciding the year of
release, or feature-freezing of new items?
--
John Naylor
EDB: http://www.enterprisedb.com

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Zhihong Yu (#12)
Re: Non-decimal integer literals

On 25.11.21 16:46, Zhihong Yu wrote:

For patch 3,

+int64
+pg_strtoint64(const char *s)

How about naming the above function pg_scanint64()?
pg_strtoint64xx() can be named pg_strtoint64() - this would align with
existing function:

pg_strtouint64(const char *str, char **endptr, int base)

That would be one way. But the existing pg_strtointNN() functions are
pretty widely used, so I would tend toward finding another name for the
less used pg_strtouint64(), maybe pg_strtouint64x() ("extended").

#15Peter Eisentraut
peter_e@gmx.net
In reply to: John Naylor (#13)
Re: Non-decimal integer literals

On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now, if
you like.

done

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1 ({integer}|{decimal})[Ee]
 realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

Yeah, it's a bit weird that not all the symbols are used in ecpg. I'll
look into explaining this better.

0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
  }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't
match other failure states:

ok

We might consider some tests for ECPG since lack of coverage has been a
problem.

right

Also, I'm curious: how does the spec work as far as deciding the year of
release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1. But the preparation work (up to v5-0005, and some other number
parsing refactoring that I'm seeing) could be considered for PG15.

I'll move this to the next CF and come back with an updated patch set in
a little while.

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#15)
Re: Non-decimal integer literals

There has been some other refactoring going on, which made this patch
set out of date. So here is an update.

The old pg_strtouint64() has been removed, so there is no longer a
naming concern with patch 0001. That one should be good to go.

I also found that yet another way to parse integers in pg_atoi() has
mostly faded away in utility, so I removed the last two callers and
removed the function in 0002 and 0003.

The remaining patches are as before, with some of the review comments
applied. I still need to write some lexing unit tests for ecpg, which I
haven't gotten to yet. This affects patches 0004 and 0005.

As mentioned before, patches 0006 and 0007 are more feature previews at
this point.

Show quoted text

On 01.12.21 16:47, Peter Eisentraut wrote:

On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now, if
you like.

done

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

Yeah, it's a bit weird that not all the symbols are used in ecpg.  I'll
look into explaining this better.

0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't
match other failure states:

ok

We might consider some tests for ECPG since lack of coverage has been
a problem.

right

Also, I'm curious: how does the spec work as far as deciding the year
of release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1.  But the preparation work (up to v5-0005, and some other number
parsing refactoring that I'm seeing) could be considered for PG15.

I'll move this to the next CF and come back with an updated patch set in
a little while.

Attachments:

v6-0001-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v6-0001-Move-scanint8-to-numutils.c.patchDownload+103-123
v6-0002-Remove-one-use-of-pg_atoi.patchtext/plain; charset=UTF-8; name=v6-0002-Remove-one-use-of-pg_atoi.patchDownload+1-2
v6-0003-Remove-pg_atoi.patchtext/plain; charset=UTF-8; name=v6-0003-Remove-pg_atoi.patchDownload+28-94
v6-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v6-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload+69-1
v6-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v6-0005-Reject-trailing-junk-after-numeric-literals.patchDownload+61-52
v6-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v6-0006-Non-decimal-integer-literals.patchDownload+511-89
v6-0007-WIP-Underscores-in-numeric-literals.patchtext/plain; charset=UTF-8; name=v6-0007-WIP-Underscores-in-numeric-literals.patchDownload+59-11
#17Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#16)
Re: Non-decimal integer literals

Another modest update, because of the copyright year update preventing
the previous patches from applying cleanly.

I also did a bit of work on the ecpg scanner so that it also handles
some errors on par with the main scanner.

There is still no automated testing of this in ecpg, but I have a bunch
of single-line test files that can provoke various errors. I will keep
these around and maybe put them into something more formal in the future.

Show quoted text

On 30.12.21 10:43, Peter Eisentraut wrote:

There has been some other refactoring going on, which made this patch
set out of date.  So here is an update.

The old pg_strtouint64() has been removed, so there is no longer a
naming concern with patch 0001.  That one should be good to go.

I also found that yet another way to parse integers in pg_atoi() has
mostly faded away in utility, so I removed the last two callers and
removed the function in 0002 and 0003.

The remaining patches are as before, with some of the review comments
applied.  I still need to write some lexing unit tests for ecpg, which I
haven't gotten to yet.  This affects patches 0004 and 0005.

As mentioned before, patches 0006 and 0007 are more feature previews at
this point.

On 01.12.21 16:47, Peter Eisentraut wrote:

On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now,
if you like.

done

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

Yeah, it's a bit weird that not all the symbols are used in ecpg.
I'll look into explaining this better.

0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't
match other failure states:

ok

We might consider some tests for ECPG since lack of coverage has been
a problem.

right

Also, I'm curious: how does the spec work as far as deciding the year
of release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1.  But the preparation work (up to v5-0005, and some other
number parsing refactoring that I'm seeing) could be considered for PG15.

I'll move this to the next CF and come back with an updated patch set
in a little while.

Attachments:

v7-0001-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v7-0001-Move-scanint8-to-numutils.c.patchDownload+103-123
v7-0002-Remove-one-use-of-pg_atoi.patchtext/plain; charset=UTF-8; name=v7-0002-Remove-one-use-of-pg_atoi.patchDownload+1-2
v7-0003-Remove-pg_atoi.patchtext/plain; charset=UTF-8; name=v7-0003-Remove-pg_atoi.patchDownload+28-94
v7-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v7-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload+78-1
v7-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v7-0005-Reject-trailing-junk-after-numeric-literals.patchDownload+91-58
v7-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v7-0006-Non-decimal-integer-literals.patchDownload+529-98
v7-0007-WIP-Underscores-in-numeric-literals.patchtext/plain; charset=UTF-8; name=v7-0007-WIP-Underscores-in-numeric-literals.patchDownload+59-11
#18Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#17)
Re: Non-decimal integer literals

Rebased patch set

Show quoted text

On 13.01.22 14:42, Peter Eisentraut wrote:

Another modest update, because of the copyright year update preventing
the previous patches from applying cleanly.

I also did a bit of work on the ecpg scanner so that it also handles
some errors on par with the main scanner.

There is still no automated testing of this in ecpg, but I have a bunch
of single-line test files that can provoke various errors.  I will keep
these around and maybe put them into something more formal in the future.

On 30.12.21 10:43, Peter Eisentraut wrote:

There has been some other refactoring going on, which made this patch
set out of date.  So here is an update.

The old pg_strtouint64() has been removed, so there is no longer a
naming concern with patch 0001.  That one should be good to go.

I also found that yet another way to parse integers in pg_atoi() has
mostly faded away in utility, so I removed the last two callers and
removed the function in 0002 and 0003.

The remaining patches are as before, with some of the review comments
applied.  I still need to write some lexing unit tests for ecpg, which
I haven't gotten to yet.  This affects patches 0004 and 0005.

As mentioned before, patches 0006 and 0007 are more feature previews
at this point.

On 01.12.21 16:47, Peter Eisentraut wrote:

On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds
more natural. Aside from that, 0001 and 0002 can probably be pushed
now, if you like.

done

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

Yeah, it's a bit weird that not all the symbols are used in ecpg.
I'll look into explaining this better.

0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor
doesn't match other failure states:

ok

We might consider some tests for ECPG since lack of coverage has
been a problem.

right

Also, I'm curious: how does the spec work as far as deciding the
year of release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1.  But the preparation work (up to v5-0005, and some other
number parsing refactoring that I'm seeing) could be considered for
PG15.

I'll move this to the next CF and come back with an updated patch set
in a little while.

Attachments:

v8-0001-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v8-0001-Move-scanint8-to-numutils.c.patchDownload+103-123
v8-0002-Remove-one-use-of-pg_atoi.patchtext/plain; charset=UTF-8; name=v8-0002-Remove-one-use-of-pg_atoi.patchDownload+1-2
v8-0003-Remove-pg_atoi.patchtext/plain; charset=UTF-8; name=v8-0003-Remove-pg_atoi.patchDownload+28-94
v8-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v8-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload+78-1
v8-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v8-0005-Reject-trailing-junk-after-numeric-literals.patchDownload+91-58
v8-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v8-0006-Non-decimal-integer-literals.patchDownload+529-98
v8-0007-WIP-Underscores-in-numeric-literals.patchtext/plain; charset=UTF-8; name=v8-0007-WIP-Underscores-in-numeric-literals.patchDownload+59-11
#19Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#18)
Re: Non-decimal integer literals

On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Rebased patch set

What if someone finds this new behavior too permissive?

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#19)
Re: Non-decimal integer literals

On 24.01.22 19:53, Robert Haas wrote:

On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Rebased patch set

What if someone finds this new behavior too permissive?

Which part exactly? There are several different changes proposed here.

#21Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#18)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#21)
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#23)
#25Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#22)
#26John Naylor
john.naylor@enterprisedb.com
In reply to: Peter Eisentraut (#24)
#27Christoph Berg
myon@debian.org
In reply to: Peter Eisentraut (#1)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: John Naylor (#26)
#29Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#28)
#30Junwang Zhao
zhjwpku@gmail.com
In reply to: Peter Eisentraut (#29)
#31Peter Eisentraut
peter_e@gmx.net
In reply to: Junwang Zhao (#30)
#32Junwang Zhao
zhjwpku@gmail.com
In reply to: Peter Eisentraut (#31)
#33John Naylor
john.naylor@enterprisedb.com
In reply to: Peter Eisentraut (#29)
#34Peter Eisentraut
peter_e@gmx.net
In reply to: John Naylor (#33)
#35Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#34)
#36John Naylor
john.naylor@enterprisedb.com
In reply to: Peter Eisentraut (#35)
#37David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#35)
#38David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#37)
#39John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#37)
#40Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Peter Eisentraut (#35)
#41Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#37)
#42David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#41)
#43Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#42)
#44Peter Eisentraut
peter_e@gmx.net
In reply to: Dean Rasheed (#40)
#45David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#43)
#46Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#38)
#47David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#46)
#48David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#44)
#49David Rowley
dgrowleyml@gmail.com
In reply to: John Naylor (#39)
#50Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: David Rowley (#49)
#51David Rowley
dgrowleyml@gmail.com
In reply to: Dean Rasheed (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#51)
#53Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#48)
#54Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#53)
#55Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Peter Eisentraut (#54)
#56Peter Eisentraut
peter_e@gmx.net
In reply to: Dean Rasheed (#55)
#57Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Peter Eisentraut (#56)
#58Joel Jacobson
joel@compiler.org
In reply to: Dean Rasheed (#55)
#59Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#58)
#60Ranier Vilela
ranier.vf@gmail.com
In reply to: Dean Rasheed (#59)
#61Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Ranier Vilela (#60)
#62Ranier Vilela
ranier.vf@gmail.com
In reply to: Dean Rasheed (#61)