Underscores in numeric literals

Started by Peter Eisentrautabout 3 years ago11 messages

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

1 attachment(s)

Here is a patch to add support for underscores in numeric literals, for
visual grouping, like

1_500_000_000
0b10001000_00000000
0o_1_755
0xFFFF_FFFF
1.618_034

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

TODO: float/numeric type input support

I did some performance tests similar to what was done in [0]/messages/by-id/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com and didn't
find any problematic deviations. Other tests would be welcome.

[0]: /messages/by-id/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
/messages/by-id/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com

Attachments:

v1-0001-Underscores-in-numeric-literals.patchtext/plain; charset=UTF-8; name=v1-0001-Underscores-in-numeric-literals.patchDownload

From 36aef78423e9adc6ebe72fb2a3cf43e385a2caca Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 27 Dec 2022 10:10:18 +0100
Subject: [PATCH v1] Underscores in numeric literals

Add support for underscores in numeric literals, for visual grouping,
like

    1_500_000_000
    0b10001000_00000000
    0o_1_755
    0xFFFF_FFFF
    1.618_034

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

TODO: float/numeric type input support

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                      |  14 ++
 src/backend/catalog/sql_features.txt          |   1 +
 src/backend/parser/scan.l                     |  63 ++++++--
 src/backend/utils/adt/numutils.c              | 144 ++++++++++++++++--
 src/fe_utils/psqlscan.l                       |  16 +-
 src/interfaces/ecpg/preproc/pgc.l             |  16 +-
 src/pl/plpgsql/src/expected/plpgsql_trap.out  |   2 +-
 src/pl/plpgsql/src/sql/plpgsql_trap.sql       |   2 +-
 src/test/regress/expected/int2.out            |  44 ++++++
 src/test/regress/expected/int4.out            |  44 ++++++
 src/test/regress/expected/int8.out            |  44 ++++++
 src/test/regress/expected/numerology.out      |  92 ++++++++++-
 src/test/regress/expected/partition_prune.out |   6 +-
 src/test/regress/sql/int2.sql                 |  14 ++
 src/test/regress/sql/int4.sql                 |  14 ++
 src/test/regress/sql/int8.sql                 |  14 ++
 src/test/regress/sql/numerology.sql           |  24 ++-
 src/test/regress/sql/partition_prune.sql      |   6 +-
 18 files changed, 509 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 956182e7c6..27e53b4b46 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -728,6 +728,20 @@ <title>Numeric Constants</title>
      </para>
     </note>
 
+    <para>
+     For visual grouping, underscores can be inserted between digits.  These
+     have no further effect on the value of the literal.  For example:
+<literallayout>1_500_000_000
+0b10001000_00000000
+0o_1_755
+0xFFFF_FFFF
+1.618_034
+</literallayout>
+     Underscores are not allowed at the start or end of a numeric constant or
+     a group of digits (that is, immediately before or after a period or the
+     <quote>e</quote>), and more than one underscore in a row is not allowed.
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index abad216b7e..3766762ae3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -528,6 +528,7 @@ T653	SQL-schema statements in external routines			YES
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
 T661	Non-decimal integer literals			YES	SQL:202x draft
+T662	Underscores in integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 9ad9e0c8ba..a1ea94ef06 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,6 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
+static char *strip_underscores(const char *in);
 static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
@@ -395,19 +396,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}+
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
@@ -1028,7 +1029,7 @@ other			.
 				}
 {numeric}		{
 					SET_YYLLOC();
-					yylval->str = pstrdup(yytext);
+					yylval->str = strip_underscores(yytext);
 					return FCONST;
 				}
 {numericfail}	{
@@ -1039,7 +1040,7 @@ other			.
 				}
 {real}			{
 					SET_YYLLOC();
-					yylval->str = pstrdup(yytext);
+					yylval->str = strip_underscores(yytext);
 					return FCONST;
 				}
 {realfail}		{
@@ -1357,6 +1358,30 @@ litbufdup(core_yyscan_t yyscanner)
 	return new;
 }
 
+static char *
+strip_underscores(const char *in)
+{
+	if (strchr(in, '_'))
+	{
+		char	   *out = palloc(strlen(in));
+		const char *p1;
+		char	   *p2;
+
+		p1 = in;
+		p2 = out;
+		while (*p1)
+		{
+			if (*p1 != '_')
+				*p2++ = *p1;
+			p1++;
+		}
+		*p2 = '\0';
+		return out;
+	}
+	else
+		return pstrdup(in);
+}
+
 /*
  * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
  * thing with {numeric}, ie digits and a decimal point.
@@ -1367,6 +1392,24 @@ process_integer_literal(const char *token, YYSTYPE *lval, int base)
 	int			val;
 	char	   *endptr;
 
+	if (strchr(token, '_'))
+	{
+		char	   *newtoken = palloc(strlen(token));
+		const char *p1;
+		char	   *p2;
+
+		p1 = token;
+		p2 = newtoken;
+		while (*p1)
+		{
+			if (*p1 != '_')
+				*p2++ = *p1;
+			p1++;
+		}
+		*p2 = '\0';
+		token = newtoken;
+	}
+
 	errno = 0;
 	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 7cded73e6e..07f04d07cb 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -141,8 +141,16 @@ pg_strtoint16_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr && (isxdigit((unsigned char) *ptr) || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT16_MIN / 16)))
 				goto out_of_range;
 
@@ -153,8 +161,16 @@ pg_strtoint16_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr && ((*ptr >= '0' && *ptr <= '7') || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT16_MIN / 8)))
 				goto out_of_range;
 
@@ -165,8 +181,16 @@ pg_strtoint16_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr && ((*ptr >= '0' && *ptr <= '1') || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT16_MIN / 2)))
 				goto out_of_range;
 
@@ -177,8 +201,20 @@ pg_strtoint16_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr && (isdigit((unsigned char) *ptr) || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* not two in a row */
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT16_MIN / 10)))
 				goto out_of_range;
 
@@ -190,6 +226,10 @@ pg_strtoint16_safe(const char *s, Node *escontext)
 	if (unlikely(ptr == firstdigit))
 		goto invalid_syntax;
 
+	/* underscore may not be last */
+	if (unlikely(*(ptr - 1) == '_'))
+		goto invalid_syntax;
+
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
 		ptr++;
@@ -268,8 +308,16 @@ pg_strtoint32_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr && (isxdigit((unsigned char) *ptr) || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT32_MIN / 16)))
 				goto out_of_range;
 
@@ -280,8 +328,16 @@ pg_strtoint32_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr && ((*ptr >= '0' && *ptr <= '7') || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT32_MIN / 8)))
 				goto out_of_range;
 
@@ -292,8 +348,16 @@ pg_strtoint32_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr && ((*ptr >= '0' && *ptr <= '1') || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT32_MIN / 2)))
 				goto out_of_range;
 
@@ -304,8 +368,20 @@ pg_strtoint32_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr && (isdigit((unsigned char) *ptr) || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* not two in a row */
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT32_MIN / 10)))
 				goto out_of_range;
 
@@ -317,6 +393,10 @@ pg_strtoint32_safe(const char *s, Node *escontext)
 	if (unlikely(ptr == firstdigit))
 		goto invalid_syntax;
 
+	/* underscore may not be last */
+	if (unlikely(*(ptr - 1) == '_'))
+		goto invalid_syntax;
+
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
 		ptr++;
@@ -395,8 +475,16 @@ pg_strtoint64_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr && (isxdigit((unsigned char) *ptr) || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT64_MIN / 16)))
 				goto out_of_range;
 
@@ -407,8 +495,16 @@ pg_strtoint64_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr && ((*ptr >= '0' && *ptr <= '7') || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT64_MIN / 8)))
 				goto out_of_range;
 
@@ -419,8 +515,16 @@ pg_strtoint64_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr && ((*ptr >= '0' && *ptr <= '1') || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT64_MIN / 2)))
 				goto out_of_range;
 
@@ -431,8 +535,20 @@ pg_strtoint64_safe(const char *s, Node *escontext)
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr && (isdigit((unsigned char) *ptr) || *ptr == '_'))
 		{
+			if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* not two in a row */
+				if (unlikely(*(ptr - 1) == '_'))
+					goto invalid_syntax;
+				ptr++;
+				continue;
+			}
+
 			if (unlikely(tmp > -(PG_INT64_MIN / 10)))
 				goto out_of_range;
 
@@ -444,6 +560,10 @@ pg_strtoint64_safe(const char *s, Node *escontext)
 	if (unlikely(ptr == firstdigit))
 		goto invalid_syntax;
 
+	/* underscore may not be last */
+	if (unlikely(*(ptr - 1) == '_'))
+		goto invalid_syntax;
+
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
 		ptr++;
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index cb1fc52138..44f80e4db4 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -333,19 +333,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 2c09c6cb4f..2761ae34b8 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -361,19 +361,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
diff --git a/src/pl/plpgsql/src/expected/plpgsql_trap.out b/src/pl/plpgsql/src/expected/plpgsql_trap.out
index 90cf6c2895..62d1679c28 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_trap.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_trap.out
@@ -141,7 +141,7 @@ begin
   declare x int;
   begin
     -- we assume this will take longer than 1 second:
-    select count(*) into x from generate_series(1, 1000000000000);
+    select count(*) into x from generate_series(1, 1_000_000_000_000);
   exception
     when others then
       raise notice 'caught others?';
diff --git a/src/pl/plpgsql/src/sql/plpgsql_trap.sql b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
index c6c1ad894b..5459b347e7 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_trap.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
@@ -88,7 +88,7 @@
   declare x int;
   begin
     -- we assume this will take longer than 1 second:
-    select count(*) into x from generate_series(1, 1000000000000);
+    select count(*) into x from generate_series(1, 1_000_000_000_000);
   exception
     when others then
       raise notice 'caught others?';
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 08c333b75a..73b4ee023c 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -440,3 +440,47 @@ SELECT int2 '-0x8001';
 ERROR:  value "-0x8001" is out of range for type smallint
 LINE 1: SELECT int2 '-0x8001';
                     ^
+-- underscores
+SELECT int2 '1_000';
+ int2 
+------
+ 1000
+(1 row)
+
+SELECT int2 '1_2_3';
+ int2 
+------
+  123
+(1 row)
+
+SELECT int2 '0xE_FF';
+ int2 
+------
+ 3839
+(1 row)
+
+SELECT int2 '0o2_73';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0b_10_0101';
+ int2 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int2 '_100';
+ERROR:  invalid input syntax for type smallint: "_100"
+LINE 1: SELECT int2 '_100';
+                    ^
+SELECT int2 '100_';
+ERROR:  invalid input syntax for type smallint: "100_"
+LINE 1: SELECT int2 '100_';
+                    ^
+SELECT int2 '10__000';
+ERROR:  invalid input syntax for type smallint: "10__000"
+LINE 1: SELECT int2 '10__000';
+                    ^
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 8386c7cdff..9c20574ca5 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -548,3 +548,47 @@ SELECT int4 '-0x80000001';
 ERROR:  value "-0x80000001" is out of range for type integer
 LINE 1: SELECT int4 '-0x80000001';
                     ^
+-- underscores
+SELECT int4 '1_000_000';
+  int4   
+---------
+ 1000000
+(1 row)
+
+SELECT int4 '1_2_3';
+ int4 
+------
+  123
+(1 row)
+
+SELECT int4 '0x1EEE_FFFF';
+   int4    
+-----------
+ 518979583
+(1 row)
+
+SELECT int4 '0o2_73';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0b_10_0101';
+ int4 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int4 '_100';
+ERROR:  invalid input syntax for type integer: "_100"
+LINE 1: SELECT int4 '_100';
+                    ^
+SELECT int4 '100_';
+ERROR:  invalid input syntax for type integer: "100_"
+LINE 1: SELECT int4 '100_';
+                    ^
+SELECT int4 '100__000';
+ERROR:  invalid input syntax for type integer: "100__000"
+LINE 1: SELECT int4 '100__000';
+                    ^
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 5b62b51be9..d9dca64e88 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -1044,3 +1044,47 @@ SELECT int8 '-0x8000000000000001';
 ERROR:  value "-0x8000000000000001" is out of range for type bigint
 LINE 1: SELECT int8 '-0x8000000000000001';
                     ^
+-- underscores
+SELECT int8 '1_000_000';
+  int8   
+---------
+ 1000000
+(1 row)
+
+SELECT int8 '1_2_3';
+ int8 
+------
+  123
+(1 row)
+
+SELECT int8 '0x1EEE_FFFF';
+   int8    
+-----------
+ 518979583
+(1 row)
+
+SELECT int8 '0o2_73';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0b_10_0101';
+ int8 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int8 '_100';
+ERROR:  invalid input syntax for type bigint: "_100"
+LINE 1: SELECT int8 '_100';
+                    ^
+SELECT int8 '100_';
+ERROR:  invalid input syntax for type bigint: "100_"
+LINE 1: SELECT int8 '100_';
+                    ^
+SELECT int8 '100__000';
+ERROR:  invalid input syntax for type bigint: "100__000"
+LINE 1: SELECT int8 '100__000';
+                    ^
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 15cd6b1672..6b9b089cb5 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -166,10 +166,6 @@ SELECT 0x0o;
 ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
-SELECT 1_2_3;
-ERROR:  trailing junk after numeric literal at or near "1_"
-LINE 1: SELECT 1_2_3;
-               ^
 SELECT 0.a;
 ERROR:  trailing junk after numeric literal at or near "0.a"
 LINE 1: SELECT 0.a;
@@ -234,6 +230,94 @@ SELECT 0x0y;
 ERROR:  trailing junk after numeric literal at or near "0x0y"
 LINE 1: SELECT 0x0y;
                ^
+-- underscores
+SELECT 1_000_000;
+ ?column? 
+----------
+  1000000
+(1 row)
+
+SELECT 1_2_3;
+ ?column? 
+----------
+      123
+(1 row)
+
+SELECT 0x1EEE_FFFF;
+ ?column?  
+-----------
+ 518979583
+(1 row)
+
+SELECT 0o2_73;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0b_10_0101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 1_000.000_005;
+  ?column?   
+-------------
+ 1000.000005
+(1 row)
+
+SELECT 1_000.;
+ ?column? 
+----------
+     1000
+(1 row)
+
+SELECT .000_005;
+ ?column? 
+----------
+ 0.000005
+(1 row)
+
+SELECT 1_000.5e0_1;
+ ?column? 
+----------
+    10005
+(1 row)
+
+-- error cases
+SELECT _100;
+ERROR:  column "_100" does not exist
+LINE 1: SELECT _100;
+               ^
+SELECT 100_;
+ERROR:  trailing junk after numeric literal at or near "100_"
+LINE 1: SELECT 100_;
+               ^
+SELECT 100__000;
+ERROR:  trailing junk after numeric literal at or near "100_"
+LINE 1: SELECT 100__000;
+               ^
+SELECT _1_000.5;
+ERROR:  syntax error at or near ".5"
+LINE 1: SELECT _1_000.5;
+                     ^
+SELECT 1_000_.5;
+ERROR:  trailing junk after numeric literal at or near "1_000_"
+LINE 1: SELECT 1_000_.5;
+               ^
+SELECT 1_000._5;
+ERROR:  trailing junk after numeric literal at or near "1_000._"
+LINE 1: SELECT 1_000._5;
+               ^
+SELECT 1_000.5_;
+ERROR:  trailing junk after numeric literal at or near "1_000.5_"
+LINE 1: SELECT 1_000.5_;
+               ^
+SELECT 1_000.5e_1;
+ERROR:  trailing junk after numeric literal at or near "1_000.5e"
+LINE 1: SELECT 1_000.5e_1;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 7555764c77..d700c00629 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1503,7 +1503,7 @@ explain (costs off) select * from like_op_noprune where a like '%BC';
 create table lparted_by_int2 (a smallint) partition by list (a);
 create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
 create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
-explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+explain (costs off) select * from lparted_by_int2 where a = 100_000_000_000_000;
         QUERY PLAN        
 --------------------------
  Result
@@ -1514,7 +1514,7 @@ create table rparted_by_int2 (a smallint) partition by range (a);
 create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
 create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
 -- all partitions pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
         QUERY PLAN        
 --------------------------
  Result
@@ -1523,7 +1523,7 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
 
 create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
 -- all partitions but rparted_by_int2_maxvalue pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
                       QUERY PLAN                      
 ------------------------------------------------------
  Seq Scan on rparted_by_int2_maxvalue rparted_by_int2
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index a812235ee5..ce8ac97963 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -141,3 +141,17 @@
 SELECT int2 '-0o100001';
 SELECT int2 '-0x8000';
 SELECT int2 '-0x8001';
+
+
+-- underscores
+
+SELECT int2 '1_000';
+SELECT int2 '1_2_3';
+SELECT int2 '0xE_FF';
+SELECT int2 '0o2_73';
+SELECT int2 '0b_10_0101';
+
+-- error cases
+SELECT int2 '_100';
+SELECT int2 '100_';
+SELECT int2 '10__000';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 9e6a40408a..146963edfb 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -196,3 +196,17 @@
 SELECT int4 '-0o20000000001';
 SELECT int4 '-0x80000000';
 SELECT int4 '-0x80000001';
+
+
+-- underscores
+
+SELECT int4 '1_000_000';
+SELECT int4 '1_2_3';
+SELECT int4 '0x1EEE_FFFF';
+SELECT int4 '0o2_73';
+SELECT int4 '0b_10_0101';
+
+-- error cases
+SELECT int4 '_100';
+SELECT int4 '100_';
+SELECT int4 '100__000';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 06f273ed58..c85717c072 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -277,3 +277,17 @@
 SELECT int8 '-0o1000000000000000000001';
 SELECT int8 '-0x8000000000000000';
 SELECT int8 '-0x8000000000000001';
+
+
+-- underscores
+
+SELECT int8 '1_000_000';
+SELECT int8 '1_2_3';
+SELECT int8 '0x1EEE_FFFF';
+SELECT int8 '0o2_73';
+SELECT int8 '0b_10_0101';
+
+-- error cases
+SELECT int8 '_100';
+SELECT int8 '100_';
+SELECT int8 '100__000';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index 310d9e5766..1941c58e68 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -45,7 +45,6 @@
 -- error cases
 SELECT 123abc;
 SELECT 0x0o;
-SELECT 1_2_3;
 SELECT 0.a;
 SELECT 0.0a;
 SELECT .0a;
@@ -66,6 +65,29 @@
 SELECT 1x;
 SELECT 0x0y;
 
+-- underscores
+SELECT 1_000_000;
+SELECT 1_2_3;
+SELECT 0x1EEE_FFFF;
+SELECT 0o2_73;
+SELECT 0b_10_0101;
+
+SELECT 1_000.000_005;
+SELECT 1_000.;
+SELECT .000_005;
+SELECT 1_000.5e0_1;
+
+-- error cases
+SELECT _100;
+SELECT 100_;
+SELECT 100__000;
+
+SELECT _1_000.5;
+SELECT 1_000_.5;
+SELECT 1_000._5;
+SELECT 1_000.5_;
+SELECT 1_000.5e_1;
+
 
 --
 -- Test implicit type conversions
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d70bd8610c..fb0583f924 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -283,16 +283,16 @@ CREATE TABLE part_rev (b INT, c INT, a INT);
 create table lparted_by_int2 (a smallint) partition by list (a);
 create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
 create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
-explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+explain (costs off) select * from lparted_by_int2 where a = 100_000_000_000_000;
 
 create table rparted_by_int2 (a smallint) partition by range (a);
 create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
 create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
 -- all partitions pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
 create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
 -- all partitions but rparted_by_int2_maxvalue pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
 
 drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, boolrangep, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
 

base-commit: 63c844a0a5d70cdbd6ae0470d582d39e75ad8d66
-- 
2.39.0

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Peter Eisentraut (#1)

Re: Underscores in numeric literals

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Here is a patch to add support for underscores in numeric literals, for
visual grouping, like

1_500_000_000
0b10001000_00000000
0o_1_755
0xFFFF_FFFF
1.618_034

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.
TODO: float/numeric type input support

Hmm ... I'm on board with allowing this in SQL if the committee says
so. I'm not especially on board with accepting it in datatype input
functions. There's been zero demand for that AFAIR. Moreover,
I don't think we need the inevitable I/O performance hit, nor the
increased risk of accepting garbage, nor the certainty of
inconsistency with other places that don't get converted (because
they depend on strtoul() or whatever).

We already accept that numeric input is different from numeric
literals: you can't write Infinity or NaN in SQL without quotes.
So I don't see an argument that we have to allow this in numeric
input for consistency.

regards, tom lane

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tom Lane (#2)

Re: Underscores in numeric literals

On Tue, Dec 27, 2022 at 09:55:32AM -0500, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Here is a patch to add support for underscores in numeric literals, for
visual grouping, like

1_500_000_000
0b10001000_00000000
0o_1_755
0xFFFF_FFFF
1.618_034

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.
TODO: float/numeric type input support

Hmm ... I'm on board with allowing this in SQL if the committee says
so.

I'm not especially on board with accepting it in datatype input
functions. There's been zero demand for that AFAIR. Moreover,
I don't think we need the inevitable I/O performance hit, nor the
increased risk of accepting garbage, nor the certainty of
inconsistency with other places that don't get converted (because
they depend on strtoul() or whatever).

+1 to accept underscores only in literals and leave input functions
alone.

(When I realized that python3.6 changed to accept things like
int("3_5"), I felt compelled to write a wrapper to check for embedded
underscores and raise an exception in that case. And I'm sure it
affected performance.)

--
Justin

Andrew Dunstan

andrew@dunslane.net

about 3 years ago

In reply to: Tom Lane (#2)

Re: Underscores in numeric literals

On 2022-12-27 Tu 09:55, Tom Lane wrote:

We already accept that numeric input is different from numeric
literals: you can't write Infinity or NaN in SQL without quotes.
So I don't see an argument that we have to allow this in numeric
input for consistency.

That's almost the same, but not quite, ISTM. Those are things you can't
say without quotes, but here unless I'm mistaken you'd be disallowing
this style if you use quotes. I get the difficulties with input
functions, but it seems like we'll be building lots of grounds for
confusion.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Dean Rasheed

dean.a.rasheed@gmail.com

about 3 years ago

In reply to: Andrew Dunstan (#4)

1 attachment(s)

Re: Underscores in numeric literals

On Wed, 28 Dec 2022 at 14:28, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-12-27 Tu 09:55, Tom Lane wrote:

We already accept that numeric input is different from numeric
literals: you can't write Infinity or NaN in SQL without quotes.
So I don't see an argument that we have to allow this in numeric
input for consistency.

That's almost the same, but not quite, ISTM. Those are things you can't
say without quotes, but here unless I'm mistaken you'd be disallowing
this style if you use quotes. I get the difficulties with input
functions, but it seems like we'll be building lots of grounds for
confusion.

Yeah, it's easy to see why something like 'NaN' needs quotes, but it
would be harder to explain why something like 1000_000 mustn't have
quotes, and couldn't be used as input to COPY.

My feeling is that we should try to make the datatype input functions
accept anything that is legal syntax as a numeric literal, even if the
reverse isn't always possible.

That said, I think it's very important to minimise any performance
hit, especially in the existing case of inputs with no underscores.

Looking at the patch's changes to pg_strtointNN(), I think there's
more that can be done to reduce that performance hit. As it stands,
every input character is checked to see if it's an underscore, and
then there's a new check at the end to ensure that the input string
doesn't have a trailing underscore. Both of those can be avoided by
rearranging things a little, as in the attached v2 patch.

In the v2 patch, each input character is only compared with underscore
if it's not a digit, so in the case of an input with no underscores or
trailing spaces, the new checks for underscores are never executed.

In addition, if an underscore is seen, it now checks that the next
character is a digit. This eliminates the possibility of two
underscores in a row, and also of a trailing underscore, and so there
is no need for the final check for trailing underscores.

Thus, if the input consists only of digits, it never has to test for
underscores at all, and the performance hit for this case is
minimised.

My other concern with this patch is that the responsibility for
handling underscores is distributed over a couple of different places.
I had the same concern about the non-decimal integer patch, but at the
time I couldn't see any way round it. Now that we have soft error
handling though, I think that there is a way to improve this,
centralising the logic for both underscore and non-decimal handling to
one place for each datatype, reducing code duplication and the chances
of bugs.

For example, make_const() in the T_Float case has gained new code to
parse both the sign and base-prefix of the input, duplicating the
logic in pg_strtointNN(). That can now be avoided by having it call
pg_strtoint64_safe() with an ErrorSaveContext, instead of strtoi64().
In the process, it would then gain the ability to handle underscores,
so they wouldn't need to be stripped off elsewhere.

Similarly, process_integer_literal() could be made to call
pg_strtoint32_safe() with an ErrorSaveContext instead of strtoint(),
and it then wouldn't need to strip off underscores, or be passed the
number's base, since pg_strtoint32_safe() would handle all of that.

In addition, I think that strip_underscores() could then go away if
numeric_in() were made to handle underscores.

Essentially then, that would move all responsibility for parsing
underscores and non-decimal integers to the datatype input functions,
or their support routines, rather than having it distributed.

Regards,
Dean

Attachments:

v2-Underscores-in-numeric-literals.patchtext/x-patch; charset=US-ASCII; name=v2-Underscores-in-numeric-literals.patchDownload

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
new file mode 100644
index 956182e..27e53b4
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -729,6 +729,20 @@ $function$
     </note>
 
     <para>
+     For visual grouping, underscores can be inserted between digits.  These
+     have no further effect on the value of the literal.  For example:
+<literallayout>1_500_000_000
+0b10001000_00000000
+0o_1_755
+0xFFFF_FFFF
+1.618_034
+</literallayout>
+     Underscores are not allowed at the start or end of a numeric constant or
+     a group of digits (that is, immediately before or after a period or the
+     <quote>e</quote>), and more than one underscore in a row is not allowed.
+    </para>
+
+    <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
      <indexterm><primary>numeric</primary></indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
new file mode 100644
index abad216..3766762
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -528,6 +528,7 @@ T653	SQL-schema statements in external r
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
 T661	Non-decimal integer literals			YES	SQL:202x draft
+T662	Underscores in integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
new file mode 100644
index 9ad9e0c..a1ea94e
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,6 +124,7 @@ static void addlit(char *ytext, int ylen
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
+static char *strip_underscores(const char *in);
 static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
@@ -395,19 +396,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}+
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
@@ -1028,7 +1029,7 @@ other			.
 				}
 {numeric}		{
 					SET_YYLLOC();
-					yylval->str = pstrdup(yytext);
+					yylval->str = strip_underscores(yytext);
 					return FCONST;
 				}
 {numericfail}	{
@@ -1039,7 +1040,7 @@ other			.
 				}
 {real}			{
 					SET_YYLLOC();
-					yylval->str = pstrdup(yytext);
+					yylval->str = strip_underscores(yytext);
 					return FCONST;
 				}
 {realfail}		{
@@ -1357,6 +1358,30 @@ litbufdup(core_yyscan_t yyscanner)
 	return new;
 }
 
+static char *
+strip_underscores(const char *in)
+{
+	if (strchr(in, '_'))
+	{
+		char	   *out = palloc(strlen(in));
+		const char *p1;
+		char	   *p2;
+
+		p1 = in;
+		p2 = out;
+		while (*p1)
+		{
+			if (*p1 != '_')
+				*p2++ = *p1;
+			p1++;
+		}
+		*p2 = '\0';
+		return out;
+	}
+	else
+		return pstrdup(in);
+}
+
 /*
  * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
  * thing with {numeric}, ie digits and a decimal point.
@@ -1367,6 +1392,24 @@ process_integer_literal(const char *toke
 	int			val;
 	char	   *endptr;
 
+	if (strchr(token, '_'))
+	{
+		char	   *newtoken = palloc(strlen(token));
+		const char *p1;
+		char	   *p2;
+
+		p1 = token;
+		p2 = newtoken;
+		while (*p1)
+		{
+			if (*p1 != '_')
+				*p2++ = *p1;
+			p1++;
+		}
+		*p2 = '\0';
+		token = newtoken;
+	}
+
 	errno = 0;
 	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
new file mode 100644
index 1e1a982..17d573a
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -141,48 +141,99 @@ pg_strtoint16_safe(const char *s, Node *
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 16)))
-				goto out_of_range;
+			if (isxdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 16)))
+					goto out_of_range;
 
-			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+				tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isxdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 8)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '7')
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 8)))
+					goto out_of_range;
 
-			tmp = tmp * 8 + (*ptr++ - '0');
+				tmp = tmp * 8 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '7')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 2)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '1')
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 2)))
+					goto out_of_range;
 
-			tmp = tmp * 2 + (*ptr++ - '0');
+				tmp = tmp * 2 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '1')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 10)))
-				goto out_of_range;
+			if (isdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 10)))
+					goto out_of_range;
 
-			tmp = tmp * 10 + (*ptr++ - '0');
+				tmp = tmp * 10 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* and it must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 
@@ -268,48 +319,99 @@ pg_strtoint32_safe(const char *s, Node *
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 16)))
-				goto out_of_range;
+			if (isxdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 16)))
+					goto out_of_range;
 
-			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+				tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isxdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 8)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '7')
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 8)))
+					goto out_of_range;
 
-			tmp = tmp * 8 + (*ptr++ - '0');
+				tmp = tmp * 8 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '7')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 2)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '1')
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 2)))
+					goto out_of_range;
 
-			tmp = tmp * 2 + (*ptr++ - '0');
+				tmp = tmp * 2 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '1')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 10)))
-				goto out_of_range;
+			if (isdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 10)))
+					goto out_of_range;
 
-			tmp = tmp * 10 + (*ptr++ - '0');
+				tmp = tmp * 10 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* and it must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 
@@ -395,48 +497,99 @@ pg_strtoint64_safe(const char *s, Node *
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 16)))
-				goto out_of_range;
+			if (isxdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 16)))
+					goto out_of_range;
 
-			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+				tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isxdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 8)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '7')
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 8)))
+					goto out_of_range;
 
-			tmp = tmp * 8 + (*ptr++ - '0');
+				tmp = tmp * 8 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '7')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 2)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '1')
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 2)))
+					goto out_of_range;
 
-			tmp = tmp * 2 + (*ptr++ - '0');
+				tmp = tmp * 2 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '1')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 10)))
-				goto out_of_range;
+			if (isdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 10)))
+					goto out_of_range;
 
-			tmp = tmp * 10 + (*ptr++ - '0');
+				tmp = tmp * 10 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* and it must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
new file mode 100644
index cb1fc52..44f80e4
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -333,19 +333,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
new file mode 100644
index 2c09c6c..2761ae3
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -361,19 +361,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
diff --git a/src/pl/plpgsql/src/expected/plpgsql_trap.out b/src/pl/plpgsql/src/expected/plpgsql_trap.out
new file mode 100644
index 90cf6c2..62d1679
--- a/src/pl/plpgsql/src/expected/plpgsql_trap.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_trap.out
@@ -141,7 +141,7 @@ begin
   declare x int;
   begin
     -- we assume this will take longer than 1 second:
-    select count(*) into x from generate_series(1, 1000000000000);
+    select count(*) into x from generate_series(1, 1_000_000_000_000);
   exception
     when others then
       raise notice 'caught others?';
diff --git a/src/pl/plpgsql/src/sql/plpgsql_trap.sql b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
new file mode 100644
index c6c1ad8..5459b34
--- a/src/pl/plpgsql/src/sql/plpgsql_trap.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
@@ -88,7 +88,7 @@ begin
   declare x int;
   begin
     -- we assume this will take longer than 1 second:
-    select count(*) into x from generate_series(1, 1000000000000);
+    select count(*) into x from generate_series(1, 1_000_000_000_000);
   exception
     when others then
       raise notice 'caught others?';
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
new file mode 100644
index 08c333b..73b4ee0
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -440,3 +440,47 @@ SELECT int2 '-0x8001';
 ERROR:  value "-0x8001" is out of range for type smallint
 LINE 1: SELECT int2 '-0x8001';
                     ^
+-- underscores
+SELECT int2 '1_000';
+ int2 
+------
+ 1000
+(1 row)
+
+SELECT int2 '1_2_3';
+ int2 
+------
+  123
+(1 row)
+
+SELECT int2 '0xE_FF';
+ int2 
+------
+ 3839
+(1 row)
+
+SELECT int2 '0o2_73';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0b_10_0101';
+ int2 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int2 '_100';
+ERROR:  invalid input syntax for type smallint: "_100"
+LINE 1: SELECT int2 '_100';
+                    ^
+SELECT int2 '100_';
+ERROR:  invalid input syntax for type smallint: "100_"
+LINE 1: SELECT int2 '100_';
+                    ^
+SELECT int2 '10__000';
+ERROR:  invalid input syntax for type smallint: "10__000"
+LINE 1: SELECT int2 '10__000';
+                    ^
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
new file mode 100644
index 8386c7c..9c20574
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -548,3 +548,47 @@ SELECT int4 '-0x80000001';
 ERROR:  value "-0x80000001" is out of range for type integer
 LINE 1: SELECT int4 '-0x80000001';
                     ^
+-- underscores
+SELECT int4 '1_000_000';
+  int4   
+---------
+ 1000000
+(1 row)
+
+SELECT int4 '1_2_3';
+ int4 
+------
+  123
+(1 row)
+
+SELECT int4 '0x1EEE_FFFF';
+   int4    
+-----------
+ 518979583
+(1 row)
+
+SELECT int4 '0o2_73';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0b_10_0101';
+ int4 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int4 '_100';
+ERROR:  invalid input syntax for type integer: "_100"
+LINE 1: SELECT int4 '_100';
+                    ^
+SELECT int4 '100_';
+ERROR:  invalid input syntax for type integer: "100_"
+LINE 1: SELECT int4 '100_';
+                    ^
+SELECT int4 '100__000';
+ERROR:  invalid input syntax for type integer: "100__000"
+LINE 1: SELECT int4 '100__000';
+                    ^
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
new file mode 100644
index 5b62b51..d9dca64
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -1044,3 +1044,47 @@ SELECT int8 '-0x8000000000000001';
 ERROR:  value "-0x8000000000000001" is out of range for type bigint
 LINE 1: SELECT int8 '-0x8000000000000001';
                     ^
+-- underscores
+SELECT int8 '1_000_000';
+  int8   
+---------
+ 1000000
+(1 row)
+
+SELECT int8 '1_2_3';
+ int8 
+------
+  123
+(1 row)
+
+SELECT int8 '0x1EEE_FFFF';
+   int8    
+-----------
+ 518979583
+(1 row)
+
+SELECT int8 '0o2_73';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0b_10_0101';
+ int8 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int8 '_100';
+ERROR:  invalid input syntax for type bigint: "_100"
+LINE 1: SELECT int8 '_100';
+                    ^
+SELECT int8 '100_';
+ERROR:  invalid input syntax for type bigint: "100_"
+LINE 1: SELECT int8 '100_';
+                    ^
+SELECT int8 '100__000';
+ERROR:  invalid input syntax for type bigint: "100__000"
+LINE 1: SELECT int8 '100__000';
+                    ^
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
new file mode 100644
index 15cd6b1..6b9b089
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -166,10 +166,6 @@ SELECT 0x0o;
 ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
-SELECT 1_2_3;
-ERROR:  trailing junk after numeric literal at or near "1_"
-LINE 1: SELECT 1_2_3;
-               ^
 SELECT 0.a;
 ERROR:  trailing junk after numeric literal at or near "0.a"
 LINE 1: SELECT 0.a;
@@ -234,6 +230,94 @@ SELECT 0x0y;
 ERROR:  trailing junk after numeric literal at or near "0x0y"
 LINE 1: SELECT 0x0y;
                ^
+-- underscores
+SELECT 1_000_000;
+ ?column? 
+----------
+  1000000
+(1 row)
+
+SELECT 1_2_3;
+ ?column? 
+----------
+      123
+(1 row)
+
+SELECT 0x1EEE_FFFF;
+ ?column?  
+-----------
+ 518979583
+(1 row)
+
+SELECT 0o2_73;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0b_10_0101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 1_000.000_005;
+  ?column?   
+-------------
+ 1000.000005
+(1 row)
+
+SELECT 1_000.;
+ ?column? 
+----------
+     1000
+(1 row)
+
+SELECT .000_005;
+ ?column? 
+----------
+ 0.000005
+(1 row)
+
+SELECT 1_000.5e0_1;
+ ?column? 
+----------
+    10005
+(1 row)
+
+-- error cases
+SELECT _100;
+ERROR:  column "_100" does not exist
+LINE 1: SELECT _100;
+               ^
+SELECT 100_;
+ERROR:  trailing junk after numeric literal at or near "100_"
+LINE 1: SELECT 100_;
+               ^
+SELECT 100__000;
+ERROR:  trailing junk after numeric literal at or near "100_"
+LINE 1: SELECT 100__000;
+               ^
+SELECT _1_000.5;
+ERROR:  syntax error at or near ".5"
+LINE 1: SELECT _1_000.5;
+                     ^
+SELECT 1_000_.5;
+ERROR:  trailing junk after numeric literal at or near "1_000_"
+LINE 1: SELECT 1_000_.5;
+               ^
+SELECT 1_000._5;
+ERROR:  trailing junk after numeric literal at or near "1_000._"
+LINE 1: SELECT 1_000._5;
+               ^
+SELECT 1_000.5_;
+ERROR:  trailing junk after numeric literal at or near "1_000.5_"
+LINE 1: SELECT 1_000.5_;
+               ^
+SELECT 1_000.5e_1;
+ERROR:  trailing junk after numeric literal at or near "1_000.5e"
+LINE 1: SELECT 1_000.5e_1;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
new file mode 100644
index 7555764..d700c00
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1503,7 +1503,7 @@ explain (costs off) select * from like_o
 create table lparted_by_int2 (a smallint) partition by list (a);
 create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
 create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
-explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+explain (costs off) select * from lparted_by_int2 where a = 100_000_000_000_000;
         QUERY PLAN        
 --------------------------
  Result
@@ -1514,7 +1514,7 @@ create table rparted_by_int2 (a smallint
 create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
 create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
 -- all partitions pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
         QUERY PLAN        
 --------------------------
  Result
@@ -1523,7 +1523,7 @@ explain (costs off) select * from rparte
 
 create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
 -- all partitions but rparted_by_int2_maxvalue pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
                       QUERY PLAN                      
 ------------------------------------------------------
  Seq Scan on rparted_by_int2_maxvalue rparted_by_int2
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
new file mode 100644
index a812235..ce8ac97
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -141,3 +141,17 @@ SELECT int2 '-0o100000';
 SELECT int2 '-0o100001';
 SELECT int2 '-0x8000';
 SELECT int2 '-0x8001';
+
+
+-- underscores
+
+SELECT int2 '1_000';
+SELECT int2 '1_2_3';
+SELECT int2 '0xE_FF';
+SELECT int2 '0o2_73';
+SELECT int2 '0b_10_0101';
+
+-- error cases
+SELECT int2 '_100';
+SELECT int2 '100_';
+SELECT int2 '10__000';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
new file mode 100644
index 9e6a404..146963e
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -196,3 +196,17 @@ SELECT int4 '-0o20000000000';
 SELECT int4 '-0o20000000001';
 SELECT int4 '-0x80000000';
 SELECT int4 '-0x80000001';
+
+
+-- underscores
+
+SELECT int4 '1_000_000';
+SELECT int4 '1_2_3';
+SELECT int4 '0x1EEE_FFFF';
+SELECT int4 '0o2_73';
+SELECT int4 '0b_10_0101';
+
+-- error cases
+SELECT int4 '_100';
+SELECT int4 '100_';
+SELECT int4 '100__000';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
new file mode 100644
index 06f273e..c85717c
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -277,3 +277,17 @@ SELECT int8 '-0o1000000000000000000000';
 SELECT int8 '-0o1000000000000000000001';
 SELECT int8 '-0x8000000000000000';
 SELECT int8 '-0x8000000000000001';
+
+
+-- underscores
+
+SELECT int8 '1_000_000';
+SELECT int8 '1_2_3';
+SELECT int8 '0x1EEE_FFFF';
+SELECT int8 '0o2_73';
+SELECT int8 '0b_10_0101';
+
+-- error cases
+SELECT int8 '_100';
+SELECT int8 '100_';
+SELECT int8 '100__000';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
new file mode 100644
index 310d9e5..1941c58
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -45,7 +45,6 @@ SELECT -0x8000000000000001;
 -- error cases
 SELECT 123abc;
 SELECT 0x0o;
-SELECT 1_2_3;
 SELECT 0.a;
 SELECT 0.0a;
 SELECT .0a;
@@ -66,6 +65,29 @@ SELECT 0x;
 SELECT 1x;
 SELECT 0x0y;
 
+-- underscores
+SELECT 1_000_000;
+SELECT 1_2_3;
+SELECT 0x1EEE_FFFF;
+SELECT 0o2_73;
+SELECT 0b_10_0101;
+
+SELECT 1_000.000_005;
+SELECT 1_000.;
+SELECT .000_005;
+SELECT 1_000.5e0_1;
+
+-- error cases
+SELECT _100;
+SELECT 100_;
+SELECT 100__000;
+
+SELECT _1_000.5;
+SELECT 1_000_.5;
+SELECT 1_000._5;
+SELECT 1_000.5_;
+SELECT 1_000.5e_1;
+
 
 --
 -- Test implicit type conversions
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
new file mode 100644
index d70bd86..fb0583f
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -283,16 +283,16 @@ explain (costs off) select * from like_o
 create table lparted_by_int2 (a smallint) partition by list (a);
 create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
 create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
-explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+explain (costs off) select * from lparted_by_int2 where a = 100_000_000_000_000;
 
 create table rparted_by_int2 (a smallint) partition by range (a);
 create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
 create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
 -- all partitions pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
 create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
 -- all partitions but rparted_by_int2_maxvalue pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
 
 drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, boolrangep, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;

Dean Rasheed

dean.a.rasheed@gmail.com

about 3 years ago

In reply to: Dean Rasheed (#5)

Re: Underscores in numeric literals

Oh, one other minor nit -- in parser/scan.l:

-real           ({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real           ({decinteger}|{numeric})[Ee][-+]?{decinteger}+

the final "+" isn't necessary now.

Regards,
Dean

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Dean Rasheed (#5)

1 attachment(s)

Re: Underscores in numeric literals

On Wed, 4 Jan 2023 at 09:28, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

In addition, I think that strip_underscores() could then go away if
numeric_in() were made to handle underscores.

Essentially then, that would move all responsibility for parsing
underscores and non-decimal integers to the datatype input functions,
or their support routines, rather than having it distributed.

Here's an update with those changes.

Regards,
Dean

Attachments:

v3-Underscores-in-numeric-literals.patchtext/x-patch; charset=US-ASCII; name=v3-Underscores-in-numeric-literals.patchDownload

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
new file mode 100644
index 0ccddea..9509d0f
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -729,6 +729,20 @@ $function$
     </note>
 
     <para>
+     For visual grouping, underscores can be inserted between digits.  These
+     have no further effect on the value of the literal.  For example:
+<literallayout>1_500_000_000
+0b10001000_00000000
+0o_1_755
+0xFFFF_FFFF
+1.618_034
+</literallayout>
+     Underscores are not allowed at the start or end of a numeric constant or
+     a group of digits (that is, immediately before or after a period or the
+     <quote>e</quote>), and more than one underscore in a row is not allowed.
+    </para>
+
+    <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
      <indexterm><primary>numeric</primary></indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
new file mode 100644
index abad216..3766762
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -528,6 +528,7 @@ T653	SQL-schema statements in external r
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
 T661	Non-decimal integer literals			YES	SQL:202x draft
+T662	Underscores in integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
new file mode 100644
index f1967a3..5020b9f
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -19,6 +19,7 @@
 #include "catalog/pg_type.h"
 #include "mb/pg_wchar.h"
 #include "nodes/makefuncs.h"
+#include "nodes/miscnodes.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/subscripting.h"
 #include "parser/parse_coerce.h"
@@ -385,47 +386,11 @@ make_const(ParseState *pstate, A_Const *
 			{
 				/* could be an oversize integer as well as a float ... */
 
-				int			base = 10;
-				char	   *startptr;
-				int			sign;
-				char	   *testvalue;
+				ErrorSaveContext escontext = {T_ErrorSaveContext};
 				int64		val64;
-				char	   *endptr;
 
-				startptr = aconst->val.fval.fval;
-				if (startptr[0] == '-')
-				{
-					sign = -1;
-					startptr++;
-				}
-				else
-					sign = +1;
-				if (startptr[0] == '0')
-				{
-					if (startptr[1] == 'b' || startptr[1] == 'B')
-					{
-						base = 2;
-						startptr += 2;
-					}
-					else if (startptr[1] == 'o' || startptr[1] == 'O')
-					{
-						base = 8;
-						startptr += 2;
-					}
-					else if (startptr[1] == 'x' || startptr[1] == 'X')
-					{
-						base = 16;
-						startptr += 2;
-					}
-				}
-
-				if (sign == +1)
-					testvalue = startptr;
-				else
-					testvalue = psprintf("-%s", startptr);
-				errno = 0;
-				val64 = strtoi64(testvalue, &endptr, base);
-				if (errno == 0 && *endptr == '\0')
+				val64 = pg_strtoint64_safe(aconst->val.fval.fval, (Node *) &escontext);
+				if (!escontext.error_occurred)
 				{
 					/*
 					 * It might actually fit in int32. Probably only INT_MIN
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
new file mode 100644
index 1e821d4..b2216a9
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -37,10 +37,12 @@
 
 #include "common/string.h"
 #include "gramparse.h"
+#include "nodes/miscnodes.h"
 #include "parser/parser.h"		/* only needed for GUC variables */
 #include "parser/scansup.h"
 #include "port/pg_bitutils.h"
 #include "mb/pg_wchar.h"
+#include "utils/builtins.h"
 }
 
 %{
@@ -395,19 +397,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
@@ -1364,12 +1366,11 @@ litbufdup(core_yyscan_t yyscanner)
 static int
 process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
-	int			val;
-	char	   *endptr;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+	int32		val;
 
-	errno = 0;
-	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
-	if (*endptr != '\0' || errno == ERANGE)
+	val = pg_strtoint32_safe(token, (Node *) &escontext);
+	if (escontext.error_occurred)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
 		lval->str = pstrdup(token);
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 898c520..a5bc89b
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -6928,10 +6928,7 @@ set_var_from_str(const char *str, const
 	}
 
 	if (!isdigit((unsigned char) *cp))
-		ereturn(escontext, false,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"numeric", str)));
+		goto invalid_syntax;
 
 	decdigits = (unsigned char *) palloc(strlen(cp) + DEC_DIGITS * 2);
 
@@ -6952,12 +6949,19 @@ set_var_from_str(const char *str, const
 		else if (*cp == '.')
 		{
 			if (have_dp)
-				ereturn(escontext, false,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s: \"%s\"",
-								"numeric", str)));
+				goto invalid_syntax;
 			have_dp = true;
 			cp++;
+			/* decimal point must not be followed by underscore */
+			if (*cp == '_')
+				goto invalid_syntax;
+		}
+		else if (*cp == '_')
+		{
+			/* underscore must be followed by more digits */
+			cp++;
+			if (!isdigit((unsigned char) *cp))
+				goto invalid_syntax;
 		}
 		else
 			break;
@@ -6970,17 +6974,8 @@ set_var_from_str(const char *str, const
 	/* Handle exponent, if any */
 	if (*cp == 'e' || *cp == 'E')
 	{
-		long		exponent;
-		char	   *endptr;
-
-		cp++;
-		exponent = strtol(cp, &endptr, 10);
-		if (endptr == cp)
-			ereturn(escontext, false,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s: \"%s\"",
-							"numeric", str)));
-		cp = endptr;
+		int64		exponent = 0;
+		bool		neg = false;
 
 		/*
 		 * At this point, dweight and dscale can't be more than about
@@ -6990,10 +6985,43 @@ set_var_from_str(const char *str, const
 		 * fit in storage format, make_result() will complain about it later;
 		 * for consistency use the same ereport errcode/text as make_result().
 		 */
-		if (exponent >= INT_MAX / 2 || exponent <= -(INT_MAX / 2))
-			ereturn(escontext, false,
-					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-					 errmsg("value overflows numeric format")));
+
+		/* exponent sign */
+		cp++;
+		if (*cp == '+')
+			cp++;
+		else if (*cp == '-')
+		{
+			neg = true;
+			cp++;
+		}
+
+		/* exponent digits */
+		if (!isdigit((unsigned char) *cp))
+			goto invalid_syntax;
+
+		while (*cp)
+		{
+			if (isdigit((unsigned char) *cp))
+			{
+				exponent = exponent * 10 + (*cp++ - '0');
+				if (exponent > INT_MAX / 2)
+					goto out_of_range;
+			}
+			else if (*cp == '_')
+			{
+				/* underscore must be followed by more digits */
+				cp++;
+				if (!isdigit((unsigned char) *cp))
+					goto invalid_syntax;
+			}
+			else
+				break;
+		}
+
+		if (neg)
+			exponent = -exponent;
+
 		dweight += (int) exponent;
 		dscale -= (int) exponent;
 		if (dscale < 0)
@@ -7045,6 +7073,17 @@ set_var_from_str(const char *str, const
 	*endptr = cp;
 
 	return true;
+
+out_of_range:
+	ereturn(escontext, false,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value overflows numeric format")));
+
+invalid_syntax:
+	ereturn(escontext, false,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"numeric", str)));
 }
 
 
@@ -7127,6 +7166,13 @@ set_var_from_non_decimal_integer_str(con
 				tmp = tmp * 16 + xdigit_value(*cp++);
 				mul = mul * 16;
 			}
+			else if (*cp == '_')
+			{
+				/* Underscore must be followed by more digits */
+				cp++;
+				if (!isxdigit((unsigned char) *cp))
+					goto invalid_syntax;
+			}
 			else
 				break;
 		}
@@ -7157,6 +7203,13 @@ set_var_from_non_decimal_integer_str(con
 				tmp = tmp * 8 + (*cp++ - '0');
 				mul = mul * 8;
 			}
+			else if (*cp == '_')
+			{
+				/* Underscore must be followed by more digits */
+				cp++;
+				if (*cp < '0' || *cp > '7')
+					goto invalid_syntax;
+			}
 			else
 				break;
 		}
@@ -7187,6 +7240,13 @@ set_var_from_non_decimal_integer_str(con
 				tmp = tmp * 2 + (*cp++ - '0');
 				mul = mul * 2;
 			}
+			else if (*cp == '_')
+			{
+				/* Underscore must be followed by more digits */
+				cp++;
+				if (*cp < '0' || *cp > '1')
+					goto invalid_syntax;
+			}
 			else
 				break;
 		}
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
new file mode 100644
index b0e412e..471fbb7
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -141,48 +141,99 @@ pg_strtoint16_safe(const char *s, Node *
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 16)))
-				goto out_of_range;
+			if (isxdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 16)))
+					goto out_of_range;
 
-			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+				tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isxdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 8)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '7')
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 8)))
+					goto out_of_range;
 
-			tmp = tmp * 8 + (*ptr++ - '0');
+				tmp = tmp * 8 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '7')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 2)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '1')
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 2)))
+					goto out_of_range;
 
-			tmp = tmp * 2 + (*ptr++ - '0');
+				tmp = tmp * 2 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '1')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT16_MIN / 10)))
-				goto out_of_range;
+			if (isdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT16_MIN / 10)))
+					goto out_of_range;
 
-			tmp = tmp * 10 + (*ptr++ - '0');
+				tmp = tmp * 10 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* and it must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 
@@ -268,48 +319,99 @@ pg_strtoint32_safe(const char *s, Node *
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 16)))
-				goto out_of_range;
+			if (isxdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 16)))
+					goto out_of_range;
 
-			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+				tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isxdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 8)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '7')
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 8)))
+					goto out_of_range;
 
-			tmp = tmp * 8 + (*ptr++ - '0');
+				tmp = tmp * 8 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '7')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 2)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '1')
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 2)))
+					goto out_of_range;
 
-			tmp = tmp * 2 + (*ptr++ - '0');
+				tmp = tmp * 2 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '1')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT32_MIN / 10)))
-				goto out_of_range;
+			if (isdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT32_MIN / 10)))
+					goto out_of_range;
 
-			tmp = tmp * 10 + (*ptr++ - '0');
+				tmp = tmp * 10 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* and it must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 
@@ -395,48 +497,99 @@ pg_strtoint64_safe(const char *s, Node *
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && isxdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 16)))
-				goto out_of_range;
+			if (isxdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 16)))
+					goto out_of_range;
 
-			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+				tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isxdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 8)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '7')
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 8)))
+					goto out_of_range;
 
-			tmp = tmp * 8 + (*ptr++ - '0');
+				tmp = tmp * 8 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '7')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
 	{
 		firstdigit = ptr += 2;
 
-		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 2)))
-				goto out_of_range;
+			if (*ptr >= '0' && *ptr <= '1')
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 2)))
+					goto out_of_range;
 
-			tmp = tmp * 2 + (*ptr++ - '0');
+				tmp = tmp * 2 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || *ptr < '0' || *ptr > '1')
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 	else
 	{
 		firstdigit = ptr;
 
-		while (*ptr && isdigit((unsigned char) *ptr))
+		while (*ptr)
 		{
-			if (unlikely(tmp > -(PG_INT64_MIN / 10)))
-				goto out_of_range;
+			if (isdigit((unsigned char) *ptr))
+			{
+				if (unlikely(tmp > -(PG_INT64_MIN / 10)))
+					goto out_of_range;
 
-			tmp = tmp * 10 + (*ptr++ - '0');
+				tmp = tmp * 10 + (*ptr++ - '0');
+			}
+			else if (*ptr == '_')
+			{
+				/* underscore may not be first */
+				if (unlikely(ptr == firstdigit))
+					goto invalid_syntax;
+				/* and it must be followed by more digits */
+				ptr++;
+				if (*ptr == '\0' || !isdigit((unsigned char) *ptr))
+					goto invalid_syntax;
+			}
+			else
+				break;
 		}
 	}
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
new file mode 100644
index bec5095..84754ac
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -333,19 +333,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
new file mode 100644
index 75815bd..dcd567e
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -361,19 +361,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
diff --git a/src/pl/plpgsql/src/expected/plpgsql_trap.out b/src/pl/plpgsql/src/expected/plpgsql_trap.out
new file mode 100644
index 90cf6c2..62d1679
--- a/src/pl/plpgsql/src/expected/plpgsql_trap.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_trap.out
@@ -141,7 +141,7 @@ begin
   declare x int;
   begin
     -- we assume this will take longer than 1 second:
-    select count(*) into x from generate_series(1, 1000000000000);
+    select count(*) into x from generate_series(1, 1_000_000_000_000);
   exception
     when others then
       raise notice 'caught others?';
diff --git a/src/pl/plpgsql/src/sql/plpgsql_trap.sql b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
new file mode 100644
index c6c1ad8..5459b34
--- a/src/pl/plpgsql/src/sql/plpgsql_trap.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
@@ -88,7 +88,7 @@ begin
   declare x int;
   begin
     -- we assume this will take longer than 1 second:
-    select count(*) into x from generate_series(1, 1000000000000);
+    select count(*) into x from generate_series(1, 1_000_000_000_000);
   exception
     when others then
       raise notice 'caught others?';
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
new file mode 100644
index 08c333b..73b4ee0
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -440,3 +440,47 @@ SELECT int2 '-0x8001';
 ERROR:  value "-0x8001" is out of range for type smallint
 LINE 1: SELECT int2 '-0x8001';
                     ^
+-- underscores
+SELECT int2 '1_000';
+ int2 
+------
+ 1000
+(1 row)
+
+SELECT int2 '1_2_3';
+ int2 
+------
+  123
+(1 row)
+
+SELECT int2 '0xE_FF';
+ int2 
+------
+ 3839
+(1 row)
+
+SELECT int2 '0o2_73';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0b_10_0101';
+ int2 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int2 '_100';
+ERROR:  invalid input syntax for type smallint: "_100"
+LINE 1: SELECT int2 '_100';
+                    ^
+SELECT int2 '100_';
+ERROR:  invalid input syntax for type smallint: "100_"
+LINE 1: SELECT int2 '100_';
+                    ^
+SELECT int2 '10__000';
+ERROR:  invalid input syntax for type smallint: "10__000"
+LINE 1: SELECT int2 '10__000';
+                    ^
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
new file mode 100644
index 8386c7c..9c20574
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -548,3 +548,47 @@ SELECT int4 '-0x80000001';
 ERROR:  value "-0x80000001" is out of range for type integer
 LINE 1: SELECT int4 '-0x80000001';
                     ^
+-- underscores
+SELECT int4 '1_000_000';
+  int4   
+---------
+ 1000000
+(1 row)
+
+SELECT int4 '1_2_3';
+ int4 
+------
+  123
+(1 row)
+
+SELECT int4 '0x1EEE_FFFF';
+   int4    
+-----------
+ 518979583
+(1 row)
+
+SELECT int4 '0o2_73';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0b_10_0101';
+ int4 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int4 '_100';
+ERROR:  invalid input syntax for type integer: "_100"
+LINE 1: SELECT int4 '_100';
+                    ^
+SELECT int4 '100_';
+ERROR:  invalid input syntax for type integer: "100_"
+LINE 1: SELECT int4 '100_';
+                    ^
+SELECT int4 '100__000';
+ERROR:  invalid input syntax for type integer: "100__000"
+LINE 1: SELECT int4 '100__000';
+                    ^
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
new file mode 100644
index 5b62b51..d9dca64
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -1044,3 +1044,47 @@ SELECT int8 '-0x8000000000000001';
 ERROR:  value "-0x8000000000000001" is out of range for type bigint
 LINE 1: SELECT int8 '-0x8000000000000001';
                     ^
+-- underscores
+SELECT int8 '1_000_000';
+  int8   
+---------
+ 1000000
+(1 row)
+
+SELECT int8 '1_2_3';
+ int8 
+------
+  123
+(1 row)
+
+SELECT int8 '0x1EEE_FFFF';
+   int8    
+-----------
+ 518979583
+(1 row)
+
+SELECT int8 '0o2_73';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0b_10_0101';
+ int8 
+------
+   37
+(1 row)
+
+-- error cases
+SELECT int8 '_100';
+ERROR:  invalid input syntax for type bigint: "_100"
+LINE 1: SELECT int8 '_100';
+                    ^
+SELECT int8 '100_';
+ERROR:  invalid input syntax for type bigint: "100_"
+LINE 1: SELECT int8 '100_';
+                    ^
+SELECT int8 '100__000';
+ERROR:  invalid input syntax for type bigint: "100__000"
+LINE 1: SELECT int8 '100__000';
+                    ^
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 9479652..56a3f36
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2144,12 +2144,17 @@ INSERT INTO num_input_test(n1) VALUES ('
 INSERT INTO num_input_test(n1) VALUES (' Infinity ');
 INSERT INTO num_input_test(n1) VALUES (' +inFinity ');
 INSERT INTO num_input_test(n1) VALUES (' -INFINITY ');
+INSERT INTO num_input_test(n1) VALUES ('12_000_000_000');
+INSERT INTO num_input_test(n1) VALUES ('12_000.123_456');
+INSERT INTO num_input_test(n1) VALUES ('23_000_000_000e-1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e+1_1');
 INSERT INTO num_input_test(n1) VALUES ('0b10001110111100111100001001010');
-INSERT INTO num_input_test(n1) VALUES ('  -0B1010101101010100101010011000110011101011000111110000101011010010  ');
+INSERT INTO num_input_test(n1) VALUES ('  -0B_1010_1011_0101_0100_1010_1001_1000_1100_1110_1011_0001_1111_0000_1010_1101_0010  ');
 INSERT INTO num_input_test(n1) VALUES ('  +0o112402761777 ');
-INSERT INTO num_input_test(n1) VALUES ('-0O001255245230633431670261');
+INSERT INTO num_input_test(n1) VALUES ('-0O0012_5524_5230_6334_3167_0261');
 INSERT INTO num_input_test(n1) VALUES ('-0x0000000000000000000000000deadbeef');
-INSERT INTO num_input_test(n1) VALUES (' 0X30b1F33a6DF0bD4E64DF9BdA7D15 ');
+INSERT INTO num_input_test(n1) VALUES (' 0X_30b1_F33a_6DF0_bD4E_64DF_9BdA_7D15 ');
 -- bad inputs
 INSERT INTO num_input_test(n1) VALUES ('     ');
 ERROR:  invalid input syntax for type numeric: "     "
@@ -2195,6 +2200,38 @@ INSERT INTO num_input_test(n1) VALUES ('
 ERROR:  invalid input syntax for type numeric: "+ infinity"
 LINE 1: INSERT INTO num_input_test(n1) VALUES ('+ infinity');
                                                ^
+INSERT INTO num_input_test(n1) VALUES ('_123');
+ERROR:  invalid input syntax for type numeric: "_123"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('_123');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('123_');
+ERROR:  invalid input syntax for type numeric: "123_"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('123_');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('12__34');
+ERROR:  invalid input syntax for type numeric: "12__34"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('12__34');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('123_.456');
+ERROR:  invalid input syntax for type numeric: "123_.456"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('123_.456');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('123._456');
+ERROR:  invalid input syntax for type numeric: "123._456"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('123._456');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('1.2e_34');
+ERROR:  invalid input syntax for type numeric: "1.2e_34"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('1.2e_34');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('1.2e34_');
+ERROR:  invalid input syntax for type numeric: "1.2e34_"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('1.2e34_');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('1.2e3__4');
+ERROR:  invalid input syntax for type numeric: "1.2e3__4"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('1.2e3__4');
+                                               ^
 INSERT INTO num_input_test(n1) VALUES ('0b1112');
 ERROR:  invalid input syntax for type numeric: "0b1112"
 LINE 1: INSERT INTO num_input_test(n1) VALUES ('0b1112');
@@ -2215,6 +2252,18 @@ INSERT INTO num_input_test(n1) VALUES ('
 ERROR:  invalid input syntax for type numeric: "0x12.34"
 LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x12.34');
                                                ^
+INSERT INTO num_input_test(n1) VALUES ('0x__1234');
+ERROR:  invalid input syntax for type numeric: "0x__1234"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x__1234');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('0x1234_');
+ERROR:  invalid input syntax for type numeric: "0x1234_"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x1234_');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('0x12__34');
+ERROR:  invalid input syntax for type numeric: "0x12__34"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x12__34');
+                                               ^
 SELECT * FROM num_input_test;
                 n1                 
 -----------------------------------
@@ -2231,13 +2280,18 @@ SELECT * FROM num_input_test;
                           Infinity
                           Infinity
                          -Infinity
+                       12000000000
+                      12000.123456
+                      2.3000000000
+                              1.23
+                              12.3
                          299792458
              -12345678901234567890
                         9999999999
              -12345678900987654321
                        -3735928559
  987654321234567898765432123456789
-(19 rows)
+(24 rows)
 
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('34.5', 'numeric');
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
new file mode 100644
index deb26d3..f662a50
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -178,10 +178,6 @@ SELECT 0x0o;
 ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
-SELECT 1_2_3;
-ERROR:  trailing junk after numeric literal at or near "1_"
-LINE 1: SELECT 1_2_3;
-               ^
 SELECT 0.a;
 ERROR:  trailing junk after numeric literal at or near "0.a"
 LINE 1: SELECT 0.a;
@@ -246,6 +242,94 @@ SELECT 0x0y;
 ERROR:  trailing junk after numeric literal at or near "0x0y"
 LINE 1: SELECT 0x0y;
                ^
+-- underscores
+SELECT 1_000_000;
+ ?column? 
+----------
+  1000000
+(1 row)
+
+SELECT 1_2_3;
+ ?column? 
+----------
+      123
+(1 row)
+
+SELECT 0x1EEE_FFFF;
+ ?column?  
+-----------
+ 518979583
+(1 row)
+
+SELECT 0o2_73;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0b_10_0101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 1_000.000_005;
+  ?column?   
+-------------
+ 1000.000005
+(1 row)
+
+SELECT 1_000.;
+ ?column? 
+----------
+     1000
+(1 row)
+
+SELECT .000_005;
+ ?column? 
+----------
+ 0.000005
+(1 row)
+
+SELECT 1_000.5e0_1;
+ ?column? 
+----------
+    10005
+(1 row)
+
+-- error cases
+SELECT _100;
+ERROR:  column "_100" does not exist
+LINE 1: SELECT _100;
+               ^
+SELECT 100_;
+ERROR:  trailing junk after numeric literal at or near "100_"
+LINE 1: SELECT 100_;
+               ^
+SELECT 100__000;
+ERROR:  trailing junk after numeric literal at or near "100_"
+LINE 1: SELECT 100__000;
+               ^
+SELECT _1_000.5;
+ERROR:  syntax error at or near ".5"
+LINE 1: SELECT _1_000.5;
+                     ^
+SELECT 1_000_.5;
+ERROR:  trailing junk after numeric literal at or near "1_000_"
+LINE 1: SELECT 1_000_.5;
+               ^
+SELECT 1_000._5;
+ERROR:  trailing junk after numeric literal at or near "1_000._"
+LINE 1: SELECT 1_000._5;
+               ^
+SELECT 1_000.5_;
+ERROR:  trailing junk after numeric literal at or near "1_000.5_"
+LINE 1: SELECT 1_000.5_;
+               ^
+SELECT 1_000.5e_1;
+ERROR:  trailing junk after numeric literal at or near "1_000.5e"
+LINE 1: SELECT 1_000.5e_1;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
new file mode 100644
index 7555764..d700c00
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1503,7 +1503,7 @@ explain (costs off) select * from like_o
 create table lparted_by_int2 (a smallint) partition by list (a);
 create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
 create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
-explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+explain (costs off) select * from lparted_by_int2 where a = 100_000_000_000_000;
         QUERY PLAN        
 --------------------------
  Result
@@ -1514,7 +1514,7 @@ create table rparted_by_int2 (a smallint
 create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
 create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
 -- all partitions pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
         QUERY PLAN        
 --------------------------
  Result
@@ -1523,7 +1523,7 @@ explain (costs off) select * from rparte
 
 create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
 -- all partitions but rparted_by_int2_maxvalue pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
                       QUERY PLAN                      
 ------------------------------------------------------
  Seq Scan on rparted_by_int2_maxvalue rparted_by_int2
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
new file mode 100644
index a812235..ce8ac97
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -141,3 +141,17 @@ SELECT int2 '-0o100000';
 SELECT int2 '-0o100001';
 SELECT int2 '-0x8000';
 SELECT int2 '-0x8001';
+
+
+-- underscores
+
+SELECT int2 '1_000';
+SELECT int2 '1_2_3';
+SELECT int2 '0xE_FF';
+SELECT int2 '0o2_73';
+SELECT int2 '0b_10_0101';
+
+-- error cases
+SELECT int2 '_100';
+SELECT int2 '100_';
+SELECT int2 '10__000';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
new file mode 100644
index 9e6a404..146963e
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -196,3 +196,17 @@ SELECT int4 '-0o20000000000';
 SELECT int4 '-0o20000000001';
 SELECT int4 '-0x80000000';
 SELECT int4 '-0x80000001';
+
+
+-- underscores
+
+SELECT int4 '1_000_000';
+SELECT int4 '1_2_3';
+SELECT int4 '0x1EEE_FFFF';
+SELECT int4 '0o2_73';
+SELECT int4 '0b_10_0101';
+
+-- error cases
+SELECT int4 '_100';
+SELECT int4 '100_';
+SELECT int4 '100__000';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
new file mode 100644
index 06f273e..c85717c
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -277,3 +277,17 @@ SELECT int8 '-0o1000000000000000000000';
 SELECT int8 '-0o1000000000000000000001';
 SELECT int8 '-0x8000000000000000';
 SELECT int8 '-0x8000000000000001';
+
+
+-- underscores
+
+SELECT int8 '1_000_000';
+SELECT int8 '1_2_3';
+SELECT int8 '0x1EEE_FFFF';
+SELECT int8 '0o2_73';
+SELECT int8 '0b_10_0101';
+
+-- error cases
+SELECT int8 '_100';
+SELECT int8 '100_';
+SELECT int8 '100__000';
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index fe93714..2db7656
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1039,12 +1039,17 @@ INSERT INTO num_input_test(n1) VALUES ('
 INSERT INTO num_input_test(n1) VALUES (' Infinity ');
 INSERT INTO num_input_test(n1) VALUES (' +inFinity ');
 INSERT INTO num_input_test(n1) VALUES (' -INFINITY ');
+INSERT INTO num_input_test(n1) VALUES ('12_000_000_000');
+INSERT INTO num_input_test(n1) VALUES ('12_000.123_456');
+INSERT INTO num_input_test(n1) VALUES ('23_000_000_000e-1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e+1_1');
 INSERT INTO num_input_test(n1) VALUES ('0b10001110111100111100001001010');
-INSERT INTO num_input_test(n1) VALUES ('  -0B1010101101010100101010011000110011101011000111110000101011010010  ');
+INSERT INTO num_input_test(n1) VALUES ('  -0B_1010_1011_0101_0100_1010_1001_1000_1100_1110_1011_0001_1111_0000_1010_1101_0010  ');
 INSERT INTO num_input_test(n1) VALUES ('  +0o112402761777 ');
-INSERT INTO num_input_test(n1) VALUES ('-0O001255245230633431670261');
+INSERT INTO num_input_test(n1) VALUES ('-0O0012_5524_5230_6334_3167_0261');
 INSERT INTO num_input_test(n1) VALUES ('-0x0000000000000000000000000deadbeef');
-INSERT INTO num_input_test(n1) VALUES (' 0X30b1F33a6DF0bD4E64DF9BdA7D15 ');
+INSERT INTO num_input_test(n1) VALUES (' 0X_30b1_F33a_6DF0_bD4E_64DF_9BdA_7D15 ');
 
 -- bad inputs
 INSERT INTO num_input_test(n1) VALUES ('     ');
@@ -1058,11 +1063,22 @@ INSERT INTO num_input_test(n1) VALUES ('
 INSERT INTO num_input_test(n1) VALUES ('+NaN');
 INSERT INTO num_input_test(n1) VALUES ('-NaN');
 INSERT INTO num_input_test(n1) VALUES ('+ infinity');
+INSERT INTO num_input_test(n1) VALUES ('_123');
+INSERT INTO num_input_test(n1) VALUES ('123_');
+INSERT INTO num_input_test(n1) VALUES ('12__34');
+INSERT INTO num_input_test(n1) VALUES ('123_.456');
+INSERT INTO num_input_test(n1) VALUES ('123._456');
+INSERT INTO num_input_test(n1) VALUES ('1.2e_34');
+INSERT INTO num_input_test(n1) VALUES ('1.2e34_');
+INSERT INTO num_input_test(n1) VALUES ('1.2e3__4');
 INSERT INTO num_input_test(n1) VALUES ('0b1112');
 INSERT INTO num_input_test(n1) VALUES ('0c1112');
 INSERT INTO num_input_test(n1) VALUES ('0o12345678');
 INSERT INTO num_input_test(n1) VALUES ('0x1eg');
 INSERT INTO num_input_test(n1) VALUES ('0x12.34');
+INSERT INTO num_input_test(n1) VALUES ('0x__1234');
+INSERT INTO num_input_test(n1) VALUES ('0x1234_');
+INSERT INTO num_input_test(n1) VALUES ('0x12__34');
 
 SELECT * FROM num_input_test;
 
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
new file mode 100644
index 310d9e5..1941c58
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -45,7 +45,6 @@ SELECT -0x8000000000000001;
 -- error cases
 SELECT 123abc;
 SELECT 0x0o;
-SELECT 1_2_3;
 SELECT 0.a;
 SELECT 0.0a;
 SELECT .0a;
@@ -66,6 +65,29 @@ SELECT 0x;
 SELECT 1x;
 SELECT 0x0y;
 
+-- underscores
+SELECT 1_000_000;
+SELECT 1_2_3;
+SELECT 0x1EEE_FFFF;
+SELECT 0o2_73;
+SELECT 0b_10_0101;
+
+SELECT 1_000.000_005;
+SELECT 1_000.;
+SELECT .000_005;
+SELECT 1_000.5e0_1;
+
+-- error cases
+SELECT _100;
+SELECT 100_;
+SELECT 100__000;
+
+SELECT _1_000.5;
+SELECT 1_000_.5;
+SELECT 1_000._5;
+SELECT 1_000.5_;
+SELECT 1_000.5e_1;
+
 
 --
 -- Test implicit type conversions
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
new file mode 100644
index d70bd86..fb0583f
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -283,16 +283,16 @@ explain (costs off) select * from like_o
 create table lparted_by_int2 (a smallint) partition by list (a);
 create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
 create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
-explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+explain (costs off) select * from lparted_by_int2 where a = 100_000_000_000_000;
 
 create table rparted_by_int2 (a smallint) partition by range (a);
 create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
 create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
 -- all partitions pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
 create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
 -- all partitions but rparted_by_int2_maxvalue pruned
-explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+explain (costs off) select * from rparted_by_int2 where a > 100_000_000_000_000;
 
 drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, boolrangep, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Dean Rasheed (#7)

Re: Underscores in numeric literals

On 23.01.23 21:45, Dean Rasheed wrote:

On Wed, 4 Jan 2023 at 09:28, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

In addition, I think that strip_underscores() could then go away if
numeric_in() were made to handle underscores.

Essentially then, that would move all responsibility for parsing
underscores and non-decimal integers to the datatype input functions,
or their support routines, rather than having it distributed.

Here's an update with those changes.

This looks good to me.

Did you have any thoughts about what to do with the float types? I
guess we could handle those in a separate patch?

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Peter Eisentraut (#8)

Re: Underscores in numeric literals

On Tue, 31 Jan 2023 at 15:28, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Did you have any thoughts about what to do with the float types? I
guess we could handle those in a separate patch?

I was assuming that we'd do nothing for float types, because anything
we did would necessarily impact their performance.

Regards,
Dean

#10

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Dean Rasheed (#9)

Re: Underscores in numeric literals

On 31.01.23 17:09, Dean Rasheed wrote:

On Tue, 31 Jan 2023 at 15:28, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Did you have any thoughts about what to do with the float types? I
guess we could handle those in a separate patch?

I was assuming that we'd do nothing for float types, because anything
we did would necessarily impact their performance.

Yeah, as long as we are using strtof() and strtod() we should just leave
it alone. If we have break that open and hand-code something, we can
reconsider it.

So I think you could go ahead with committing your patch and we can
consider this topic done for now.

#11

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Peter Eisentraut (#10)

Re: Underscores in numeric literals

On Thu, 2 Feb 2023 at 22:40, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 31.01.23 17:09, Dean Rasheed wrote:

On Tue, 31 Jan 2023 at 15:28, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Did you have any thoughts about what to do with the float types? I
guess we could handle those in a separate patch?

I was assuming that we'd do nothing for float types, because anything
we did would necessarily impact their performance.

Yeah, as long as we are using strtof() and strtod() we should just leave
it alone. If we have break that open and hand-code something, we can
reconsider it.

So I think you could go ahead with committing your patch and we can
consider this topic done for now.

Done.

Regards,
Dean