Non-decimal integer literals

Started by Peter Eisentrautover 4 years ago62 messages

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

1 attachment(s)

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Those core parts are straightforward enough, but there are a bunch of
other places where integers are parsed, and one could consider in each
case whether they should get the same treatment, for example the
replication syntax lexer, or input function for oid, numeric, and
int2vector. There are also some opportunities to move some code around,
for example scanint8() could be in numutils.c. I have also looked with
some suspicion at some details of the number lexing in ecpg, but haven't
found anything I could break yet. Suggestions are welcome.

Attachments:

v1-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v1-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download

From f2a9b37968a55bf91feb2b4753745c9f5a64be2e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 16 Aug 2021 09:32:14 +0200
Subject: [PATCH v1] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42E
    0o112
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.
---
 doc/src/sgml/syntax.sgml             | 26 ++++++++
 src/backend/catalog/sql_features.txt |  1 +
 src/backend/parser/scan.l            | 70 ++++++++++++++------
 src/backend/utils/adt/int8.c         | 54 ++++++++++++++++
 src/backend/utils/adt/numutils.c     | 97 ++++++++++++++++++++++++++++
 src/fe_utils/psqlscan.l              | 55 +++++++++++-----
 src/interfaces/ecpg/preproc/pgc.l    | 64 +++++++++++-------
 src/test/regress/expected/int2.out   | 19 ++++++
 src/test/regress/expected/int4.out   | 37 +++++++++++
 src/test/regress/expected/int8.out   | 19 ++++++
 src/test/regress/sql/int2.sql        |  7 ++
 src/test/regress/sql/int4.sql        | 11 ++++
 src/test/regress/sql/int8.sql        |  7 ++
 13 files changed, 412 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..8fb4b1228d 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o112
+0O755
+0x42e
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..83458ffb30 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -262,7 +262,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -341,7 +341,7 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
+
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -380,24 +380,41 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+integer			({decinteger}|{hexinteger}|{octinteger}|{bininteger})
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
 
-param			\${integer}
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 other			.
 
@@ -977,12 +994,22 @@ other			.
 					SET_YYLLOC();
 					return process_integer_literal(yytext, yylval);
 				}
-{decimal}		{
+{hexfail}		{
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
+					yyerror("invalid octal integer");
+				}
+{binfail}		{
+					yyerror("invalid binary integer");
+				}
+
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
@@ -996,7 +1023,7 @@ other			.
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {integer} or {numeric}.
 					 */
 					yyless(yyleng - 1);
 					SET_YYLLOC();
@@ -1296,7 +1323,7 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
@@ -1306,7 +1333,14 @@ process_integer_literal(const char *token, YYSTYPE *lval)
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	if (token[0] == '0' && (token[1] == 'X' || token[1] == 'x'))
+		val = strtoint(token + 2, &endptr, 16);
+	else if (token[0] == '0' && (token[1] == 'O' || token[1] == 'o'))
+		val = strtoint(token + 2, &endptr, 8);
+	else if (token[0] == '0' && (token[1] == 'B' || token[1] == 'b'))
+		val = strtoint(token + 2, &endptr, 2);
+	else
+		val = strtoint(token, &endptr, 10);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 2168080dcc..c3ed944a6c 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -45,6 +45,17 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * scanint8 --- try to parse a string into an int8.
  *
@@ -84,6 +95,48 @@ scanint8(const char *str, bool errorOK, int64 *result)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -92,6 +145,7 @@ scanint8(const char *str, bool errorOK, int64 *result)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index b93096f288..7c6520346e 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -173,6 +173,17 @@ pg_atoi(const char *s, int size, int c)
 	return (int32) l;
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -208,6 +219,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -216,6 +269,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -284,6 +338,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -292,6 +388,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0fab48a382..729aec562b 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -200,7 +200,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -279,7 +279,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -318,24 +317,41 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+integer			({decinteger}|{hexinteger}|{octinteger}|{bininteger})
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
 
-param			\${integer}
+param			\${decinteger}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -842,10 +858,19 @@ other			.
 {integer}		{
 					ECHO;
 				}
-{decimal}		{
+{hexfail}		{
+					ECHO;
+				}
+{octfail}		{
+					ECHO;
+				}
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
 					ECHO;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 7a0356638d..ebd1f3d7f4 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -305,7 +305,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -346,24 +345,41 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+integer			({decinteger}|{hexinteger}|{octinteger}|{bininteger})
 
-param			\${integer}
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -393,9 +409,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -408,7 +421,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -926,11 +939,11 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {integer}		{
 					return process_integer_literal(yytext, &base_yylval);
 				}
-{decimal}		{
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					return process_integer_literal(yytext, &base_yylval);
@@ -942,7 +955,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {integer} or {numeric}.
 					 */
 					yyless(yyleng - 1);
 					return process_integer_literal(yytext, &base_yylval);
@@ -1009,7 +1022,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
+<C>{hexinteger}		{
 						char* endptr;
 
 						errno = 0;
@@ -1546,7 +1559,7 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
@@ -1556,7 +1569,14 @@ process_integer_literal(const char *token, YYSTYPE *lval)
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	if (token[0] == '0' && (token[1] == 'X' || token[1] == 'x'))
+		val = strtoint(token + 2, &endptr, 16);
+	else if (token[0] == '0' && (token[1] == 'O' || token[1] == 'o'))
+		val = strtoint(token + 2, &endptr, 8);
+	else if (token[0] == '0' && (token[1] == 'B' || token[1] == 'b'))
+		val = strtoint(token + 2, &endptr, 2);
+	else
+		val = strtoint(token, &endptr, 10);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..0ffa00a835 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o112';
+ int2 
+------
+   74
+(1 row)
+
+SELECT int2 '0x42E';
+ int2 
+------
+ 1070
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..8c1e4237e8 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,40 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o112;
+ ?column? 
+----------
+       74
+(1 row)
+
+SELECT 0x42E;
+ ?column? 
+----------
+     1070
+(1 row)
+
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o112';
+ int4 
+------
+   74
+(1 row)
+
+SELECT int4 '0x42E';
+ int4 
+------
+ 1070
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..0a1c2ae216 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o112';
+ int8 
+------
+   74
+(1 row)
+
+SELECT int8 '0x42E';
+ int8 
+------
+ 1070
+(1 row)
+
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..c4410fa62d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o112';
+SELECT int2 '0x42E';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..c4da86ba33 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,14 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT 0b100101;
+SELECT 0o112;
+SELECT 0x42E;
+
+SELECT int4 '0b100101';
+SELECT int4 '0o112';
+SELECT int4 '0x42E';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..4cc4830bdc 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o112';
+SELECT int8 '0x42E';
-- 
2.32.0

John Naylor

john.naylor@enterprisedb.com

over 4 years ago

In reply to: Peter Eisentraut (#1)

Re: Non-decimal integer literals

On Mon, Aug 16, 2021 at 5:52 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

The one thing that jumped out at me on a cursory reading is the {integer}
rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

--
John Naylor
EDB: http://www.enterprisedb.com

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: John Naylor (#2)

1 attachment(s)

Re: Non-decimal integer literals

On 16.08.21 17:32, John Naylor wrote:

The one thing that jumped out at me on a cursory reading is
the {integer} rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

Agreed, that can be done in a simpler way. Here is an updated patch.

Attachments:

v2-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v2-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download

From f90826f77d8067a1641f60dd75d5ea1d83466ea9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 7 Sep 2021 13:10:18 +0200
Subject: [PATCH v2] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42E
    0o112
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml             | 26 ++++++++
 src/backend/catalog/sql_features.txt |  1 +
 src/backend/parser/scan.l            | 87 ++++++++++++++++++-------
 src/backend/utils/adt/int8.c         | 54 ++++++++++++++++
 src/backend/utils/adt/numutils.c     | 97 ++++++++++++++++++++++++++++
 src/fe_utils/psqlscan.l              | 55 +++++++++++-----
 src/interfaces/ecpg/preproc/pgc.l    | 64 +++++++++++-------
 src/test/regress/expected/int2.out   | 19 ++++++
 src/test/regress/expected/int4.out   | 37 +++++++++++
 src/test/regress/expected/int8.out   | 19 ++++++
 src/test/regress/sql/int2.sql        |  7 ++
 src/test/regress/sql/int4.sql        | 11 ++++
 src/test/regress/sql/int8.sql        |  7 ++
 13 files changed, 422 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..8fb4b1228d 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o112
+0O755
+0x42e
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..a78fe7a2ed 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -262,7 +262,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -341,7 +341,7 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
+
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -380,24 +380,39 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
+ *
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
 
-param			\${integer}
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 other			.
 
@@ -973,20 +988,42 @@ other			.
 					return PARAM;
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext + 2, yylval, 16);
 				}
-{decimal}		{
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail}		{
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
+					yyerror("invalid octal integer");
+				}
+{binfail}		{
+					yyerror("invalid binary integer");
+				}
+
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -996,17 +1033,17 @@ other			.
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is a {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 
 
@@ -1296,17 +1333,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 2168080dcc..c3ed944a6c 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -45,6 +45,17 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * scanint8 --- try to parse a string into an int8.
  *
@@ -84,6 +95,48 @@ scanint8(const char *str, bool errorOK, int64 *result)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -92,6 +145,7 @@ scanint8(const char *str, bool errorOK, int64 *result)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index b93096f288..7c6520346e 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -173,6 +173,17 @@ pg_atoi(const char *s, int size, int c)
 	return (int32) l;
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -208,6 +219,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -216,6 +269,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -284,6 +338,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -292,6 +388,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0fab48a382..729aec562b 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -200,7 +200,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -279,7 +279,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -318,24 +317,41 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+integer			({decinteger}|{hexinteger}|{octinteger}|{bininteger})
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
 
-param			\${integer}
+param			\${decinteger}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -842,10 +858,19 @@ other			.
 {integer}		{
 					ECHO;
 				}
-{decimal}		{
+{hexfail}		{
+					ECHO;
+				}
+{octfail}		{
+					ECHO;
+				}
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
 					ECHO;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 7a0356638d..ebd1f3d7f4 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -305,7 +305,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -346,24 +345,41 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+integer			({decinteger}|{hexinteger}|{octinteger}|{bininteger})
 
-param			\${integer}
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -393,9 +409,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -408,7 +421,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -926,11 +939,11 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {integer}		{
 					return process_integer_literal(yytext, &base_yylval);
 				}
-{decimal}		{
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					return process_integer_literal(yytext, &base_yylval);
@@ -942,7 +955,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {integer} or {numeric}.
 					 */
 					yyless(yyleng - 1);
 					return process_integer_literal(yytext, &base_yylval);
@@ -1009,7 +1022,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
+<C>{hexinteger}		{
 						char* endptr;
 
 						errno = 0;
@@ -1546,7 +1559,7 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
@@ -1556,7 +1569,14 @@ process_integer_literal(const char *token, YYSTYPE *lval)
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	if (token[0] == '0' && (token[1] == 'X' || token[1] == 'x'))
+		val = strtoint(token + 2, &endptr, 16);
+	else if (token[0] == '0' && (token[1] == 'O' || token[1] == 'o'))
+		val = strtoint(token + 2, &endptr, 8);
+	else if (token[0] == '0' && (token[1] == 'B' || token[1] == 'b'))
+		val = strtoint(token + 2, &endptr, 2);
+	else
+		val = strtoint(token, &endptr, 10);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..0ffa00a835 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o112';
+ int2 
+------
+   74
+(1 row)
+
+SELECT int2 '0x42E';
+ int2 
+------
+ 1070
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..8c1e4237e8 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,40 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o112;
+ ?column? 
+----------
+       74
+(1 row)
+
+SELECT 0x42E;
+ ?column? 
+----------
+     1070
+(1 row)
+
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o112';
+ int4 
+------
+   74
+(1 row)
+
+SELECT int4 '0x42E';
+ int4 
+------
+ 1070
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..0a1c2ae216 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o112';
+ int8 
+------
+   74
+(1 row)
+
+SELECT int8 '0x42E';
+ int8 
+------
+ 1070
+(1 row)
+
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..c4410fa62d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o112';
+SELECT int2 '0x42E';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..c4da86ba33 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,14 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT 0b100101;
+SELECT 0o112;
+SELECT 0x42E;
+
+SELECT int4 '0b100101';
+SELECT int4 '0o112';
+SELECT int4 '0x42E';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..4cc4830bdc 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o112';
+SELECT int8 '0x42E';
-- 
2.33.0

Zhihong Yu

zyu@yugabyte.com

over 4 years ago

In reply to: Peter Eisentraut (#3)

Re: Non-decimal integer literals

On Tue, Sep 7, 2021 at 4:13 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 16.08.21 17:32, John Naylor wrote:

The one thing that jumped out at me on a cursory reading is
the {integer} rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

Agreed, that can be done in a simpler way. Here is an updated patch.

Hi,
Minor comment:

+SELECT int4 '0o112';

Maybe involve digits of up to 7 in the octal test case.

Thanks

Vik Fearing

vik@postgresfriends.org

over 4 years ago

In reply to: Peter Eisentraut (#1)

Re: Non-decimal integer literals

On 8/16/21 11:51 AM, Peter Eisentraut wrote:

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

    0x42E
    0o112
    0b100101

per SQL:202x draft.

Is there any hope of adding the optional underscores? I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

(The standard does not allow identifiers to begin with _ but we do...)
--
Vik Fearing

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Vik Fearing (#5)

Re: Non-decimal integer literals

Vik Fearing <vik@postgresfriends.org> writes:

On 8/16/21 11:51 AM, Peter Eisentraut wrote:

Here is a patch to add support for hexadecimal, octal, and binary
integer literals:

    0x42E
    0o112
    0b100101

per SQL:202x draft.

Is there any hope of adding the optional underscores? I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today, e.g.

regression=# select 0x42e;
x42e
------
0
(1 row)

AFAIR we've seen exactly zero field demand for this feature,
so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

regards, tom lane

Vik Fearing

vik@postgresfriends.org

over 4 years ago

In reply to: Tom Lane (#6)

Re: Non-decimal integer literals

On 9/8/21 3:14 PM, Tom Lane wrote:

Vik Fearing <vik@postgresfriends.org> writes:

Is there any hope of adding the optional underscores? I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today,

Ah, true that. So if this does go in, we may as well add the
underscores at the same time.

AFAIR we've seen exactly zero field demand for this feature,

I have often wanted something like this, even if I didn't bring it up on
this list. I have had customers who have wanted this, too. My response
has always been to show these exact problems to explain why it's not
possible, but if it's going to be in the standard then I favor doing it.

I have never really had a use for octal, but sometimes binary and hex
make things much clearer. Having a grouping separator for large numbers
is even more useful.

so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

I don't really see a hurry here. I am fine with waiting until the draft
becomes final.
--
Vik Fearing

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Zhihong Yu (#4)

1 attachment(s)

Re: Non-decimal integer literals

On 07.09.21 13:50, Zhihong Yu wrote:

On 16.08.21 17:32, John Naylor wrote:

The one thing that jumped out at me on a cursory reading is
the {integer} rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token

text to

figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

Agreed, that can be done in a simpler way. Here is an updated patch.

Hi,
Minor comment:

+SELECT int4 '0o112';

Maybe involve digits of up to 7 in the octal test case.

Good point, here is a lightly updated patch.

Attachments:

v3-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v3-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download

From 43957a1f48ed6f750f231ef8e3533d74d7ac4cc9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 28 Sep 2021 17:14:44 +0200
Subject: [PATCH v3] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   | 26 ++++++
 src/backend/catalog/information_schema.sql |  6 +-
 src/backend/catalog/sql_features.txt       |  1 +
 src/backend/parser/scan.l                  | 87 +++++++++++++------
 src/backend/utils/adt/int8.c               | 54 ++++++++++++
 src/backend/utils/adt/numutils.c           | 97 ++++++++++++++++++++++
 src/fe_utils/psqlscan.l                    | 55 ++++++++----
 src/interfaces/ecpg/preproc/pgc.l          | 64 +++++++++-----
 src/test/regress/expected/int2.out         | 19 +++++
 src/test/regress/expected/int4.out         | 37 +++++++++
 src/test/regress/expected/int8.out         | 19 +++++
 src/test/regress/sql/int2.sql              |  7 ++
 src/test/regress/sql/int4.sql              | 11 +++
 src/test/regress/sql/int8.sql              |  7 ++
 14 files changed, 425 insertions(+), 65 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 11d9dd60c2..ce88c483a2 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..a78fe7a2ed 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -262,7 +262,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -341,7 +341,7 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
+
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -380,24 +380,39 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
+ *
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
 
-param			\${integer}
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 other			.
 
@@ -973,20 +988,42 @@ other			.
 					return PARAM;
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext + 2, yylval, 16);
 				}
-{decimal}		{
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail}		{
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
+					yyerror("invalid octal integer");
+				}
+{binfail}		{
+					yyerror("invalid binary integer");
+				}
+
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -996,17 +1033,17 @@ other			.
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is a {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 
 
@@ -1296,17 +1333,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 2168080dcc..c3ed944a6c 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -45,6 +45,17 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * scanint8 --- try to parse a string into an int8.
  *
@@ -84,6 +95,48 @@ scanint8(const char *str, bool errorOK, int64 *result)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -92,6 +145,7 @@ scanint8(const char *str, bool errorOK, int64 *result)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index b93096f288..7c6520346e 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -173,6 +173,17 @@ pg_atoi(const char *s, int size, int c)
 	return (int32) l;
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -208,6 +219,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -216,6 +269,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -284,6 +338,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -292,6 +388,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0fab48a382..729aec562b 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -200,7 +200,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -279,7 +279,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -318,24 +317,41 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+integer			({decinteger}|{hexinteger}|{octinteger}|{bininteger})
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
 
-param			\${integer}
+param			\${decinteger}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -842,10 +858,19 @@ other			.
 {integer}		{
 					ECHO;
 				}
-{decimal}		{
+{hexfail}		{
+					ECHO;
+				}
+{octfail}		{
+					ECHO;
+				}
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
 					ECHO;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 7a0356638d..ebd1f3d7f4 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -305,7 +305,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -346,24 +345,41 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+integer			({decinteger}|{hexinteger}|{octinteger}|{bininteger})
 
-param			\${integer}
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -393,9 +409,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -408,7 +421,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -926,11 +939,11 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {integer}		{
 					return process_integer_literal(yytext, &base_yylval);
 				}
-{decimal}		{
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					return process_integer_literal(yytext, &base_yylval);
@@ -942,7 +955,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {integer} or {numeric}.
 					 */
 					yyless(yyleng - 1);
 					return process_integer_literal(yytext, &base_yylval);
@@ -1009,7 +1022,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
+<C>{hexinteger}		{
 						char* endptr;
 
 						errno = 0;
@@ -1546,7 +1559,7 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
@@ -1556,7 +1569,14 @@ process_integer_literal(const char *token, YYSTYPE *lval)
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	if (token[0] == '0' && (token[1] == 'X' || token[1] == 'x'))
+		val = strtoint(token + 2, &endptr, 16);
+	else if (token[0] == '0' && (token[1] == 'O' || token[1] == 'o'))
+		val = strtoint(token + 2, &endptr, 8);
+	else if (token[0] == '0' && (token[1] == 'B' || token[1] == 'b'))
+		val = strtoint(token + 2, &endptr, 2);
+	else
+		val = strtoint(token, &endptr, 10);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..220e1493e8 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..bb23331c3e 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,40 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..edd15a4353 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..0dee22fe6d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..3b214cdb65 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,14 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..b7ad696dd8 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
-- 
2.33.0

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Vik Fearing (#7)

Re: Non-decimal integer literals

On 09.09.21 16:08, Vik Fearing wrote:

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today,

Ah, true that. So if this does go in, we may as well add the
underscores at the same time.

Yeah, looks like I'll need to look into the identifier lexing issues
previously discussed. I'll attack that during the next commit fest.

so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

I don't really see a hurry here. I am fine with waiting until the draft
becomes final.

Right, the point is to explore this now so that it can be ready when the
standard is ready.

#10

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Peter Eisentraut (#9)

1 attachment(s)

Re: Non-decimal integer literals

On 28.09.21 17:30, Peter Eisentraut wrote:

On 09.09.21 16:08, Vik Fearing wrote:

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid
number-followed-by-identifier
today,

Ah, true that. So if this does go in, we may as well add the
underscores at the same time.

Yeah, looks like I'll need to look into the identifier lexing issues
previously discussed. I'll attack that during the next commit fest.

Here is an updated patch for this. It's the previous patch polished a
bit more, and it contains changes so that numeric literals reject
trailing identifier parts without whitespace in between, as discussed.
Maybe I should split that into incremental patches, but for now I only
have the one. I don't have a patch for the underscores in numeric
literals yet. It's in progress, but not ready.

Attachments:

v4-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v4-0001-Non-decimal-integer-literals.patch; x-mac-creator=0; x-mac-type=0Download

From 6e081c44c04201ee9ded9dc6b689824ccabdfc28 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sun, 31 Oct 2021 15:42:18 +0100
Subject: [PATCH v4] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  26 ++++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/scan.l                  | 103 ++++++++++++++++-----
 src/backend/utils/adt/int8.c               |  54 +++++++++++
 src/backend/utils/adt/numutils.c           |  97 +++++++++++++++++++
 src/fe_utils/psqlscan.l                    |  81 ++++++++++++----
 src/interfaces/ecpg/preproc/pgc.l          |  95 +++++++++++--------
 src/test/regress/expected/int2.out         |  19 ++++
 src/test/regress/expected/int4.out         |  75 +++++++++++++++
 src/test/regress/expected/int8.out         |  19 ++++
 src/test/regress/sql/int2.sql              |   7 ++
 src/test/regress/sql/int4.sql              |  26 ++++++
 src/test/regress/sql/int8.sql              |   7 ++
 14 files changed, 531 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 11d9dd60c2..ce88c483a2 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..fe5ddbe2aa 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -262,7 +262,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -341,7 +341,7 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
+
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -380,24 +380,44 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail0		0[xX]
+octfail0		0[oO]
+binfail0		0[bB]
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+decfail         {decinteger}{ident_start}
+hexfail			{hexinteger}{ident_start}
+octfail			{octinteger}{ident_start}
+binfail			{bininteger}{ident_start}
 
-param			\${integer}
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 other			.
 
@@ -973,20 +993,53 @@ other			.
 					return PARAM;
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail0}		{
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail0}		{
+					yyerror("invalid octal integer");
+				}
+{binfail0}		{
+					yyerror("invalid binary integer");
+				}
+{decfail}		{
+					yyerror("trailing junk after decimal integer");
+				}
+{hexfail}		{
+					yyerror("trailing junk after hexadecimal integer");
+				}
+{octfail}		{
+					yyerror("trailing junk after octal integer");
+				}
+{binfail}		{
+					yyerror("trailing junk after binary integer");
 				}
-{decimal}		{
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -996,17 +1049,17 @@ other			.
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is a {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 
 
@@ -1296,17 +1349,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 2168080dcc..c3ed944a6c 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -45,6 +45,17 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * scanint8 --- try to parse a string into an int8.
  *
@@ -84,6 +95,48 @@ scanint8(const char *str, bool errorOK, int64 *result)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -92,6 +145,7 @@ scanint8(const char *str, bool errorOK, int64 *result)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index b93096f288..7c6520346e 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -173,6 +173,17 @@ pg_atoi(const char *s, int size, int c)
 	return (int32) l;
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -208,6 +219,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -216,6 +269,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -284,6 +338,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -292,6 +388,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0fab48a382..4436509d88 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -200,7 +200,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -279,7 +279,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -318,24 +317,44 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail0		0[xX]
+octfail0		0[oO]
+binfail0		0[bB]
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+decfail			{decinteger}{ident_start}
+hexfail			{hexinteger}{ident_start}
+octfail			{octinteger}{ident_start}
+binfail			{bininteger}{ident_start}
 
-param			\${integer}
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -839,13 +858,43 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail0}		{
+					ECHO;
+				}
+{octfail0}		{
+					ECHO;
+				}
+{binfail0}		{
+					ECHO;
+				}
+{decfail}		{
+					ECHO;
+				}
+{hexfail}		{
+					ECHO;
+				}
+{octfail}		{
+					ECHO;
+				}
+{binfail}		{
 					ECHO;
 				}
-{decimal}		{
+{numeric}		{
 					ECHO;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 7a0356638d..8d6e1cd76a 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -305,7 +305,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -346,24 +345,44 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail0		0[xX]
+octfail0		0[oO]
+binfail0		0[bB]
+
+decfail			{decinteger}{ident_start}
+hexfail			{hexinteger}{ident_start}
+octfail			{octinteger}{ident_start}
+binfail			{bininteger}{ident_start}
 
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
 
-param			\${integer}
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+param			\${decinteger}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -393,9 +412,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -408,7 +424,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -923,17 +939,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
-{decimal}		{
+{hexinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 16);
+				}
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -942,18 +961,25 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
+<SQL>{octinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 8);
+				}
+<SQL>{bininteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 2);
+				}
+
 <SQL>{
 :{identifier}((("->"|\.){identifier})|(\[{array}\]))*	{
 					base_yylval.str = mm_strdup(yytext+1);
@@ -1009,19 +1035,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *)yytext,&endptr,16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1546,17 +1559,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..220e1493e8 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..060b599705 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,78 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
+-- lexer literals
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- error cases
+SELECT 0b;
+ERROR:  invalid binary integer at or near "SELECT 0b"
+LINE 1: SELECT 0b;
+        ^
+SELECT 1b;
+ERROR:  trailing junk after decimal integer at or near "SELECT 1b"
+LINE 1: SELECT 1b;
+        ^
+SELECT 0b0x;
+ERROR:  trailing junk after binary integer at or near "SELECT 0b0x"
+LINE 1: SELECT 0b0x;
+        ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "SELECT 0o"
+LINE 1: SELECT 0o;
+        ^
+SELECT 1o;
+ERROR:  trailing junk after decimal integer at or near "SELECT 1o"
+LINE 1: SELECT 1o;
+        ^
+SELECT 0o0x;
+ERROR:  trailing junk after octal integer at or near "SELECT 0o0x"
+LINE 1: SELECT 0o0x;
+        ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "SELECT 0x"
+LINE 1: SELECT 0x;
+        ^
+SELECT 1x;
+ERROR:  trailing junk after decimal integer at or near "SELECT 1x"
+LINE 1: SELECT 1x;
+        ^
+SELECT 0x0y;
+ERROR:  trailing junk after hexadecimal integer at or near "SELECT 0x0y"
+LINE 1: SELECT 0x0y;
+        ^
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..edd15a4353 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..0dee22fe6d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..d97d017fca 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,29 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
+
+-- lexer literals
+
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- error cases
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..b7ad696dd8 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
-- 
2.33.1

#11

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Peter Eisentraut (#10)

6 attachment(s)

Re: Non-decimal integer literals

On 01.11.21 07:09, Peter Eisentraut wrote:

Here is an updated patch for this. It's the previous patch polished a
bit more, and it contains changes so that numeric literals reject
trailing identifier parts without whitespace in between, as discussed.
Maybe I should split that into incremental patches, but for now I only
have the one. I don't have a patch for the underscores in numeric
literals yet. It's in progress, but not ready.

Here is a progressed version of this work, split into more incremental
patches. The first three patches are harmless code cleanups. Patch 3
has an interesting naming conflict, noted in the commit message; ideas
welcome. Patches 4 and 5 handle the rejection of trailing junk after
numeric literals, as discussed. I have expanded that compared to the v4
patch to also cover non-integer literals. It also comes with more tests
now. Patch 6 is the titular introduction of non-decimal integer
literals, unchanged from before.

Attachments:

v5-0001-Improve-some-comments-in-scanner-files.patchtext/plain; charset=UTF-8; name=v5-0001-Improve-some-comments-in-scanner-files.patchDownload

From 39aed9c0516fcf0a6b3372361ecfcf4874614578 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 24 Nov 2021 09:10:32 +0100
Subject: [PATCH v5 1/6] Improve some comments in scanner files

---
 src/backend/parser/scan.l         | 14 ++++++++------
 src/fe_utils/psqlscan.l           | 14 ++++++++------
 src/interfaces/ecpg/preproc/pgc.l | 16 +++++++++-------
 3 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..4e02815803 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -174,7 +174,7 @@ extern void core_yyset_column(int column_no, yyscan_t yyscanner);
  *  <xb> bit string literal
  *  <xc> extended C-style comments
  *  <xd> delimited identifiers (double-quoted identifiers)
- *  <xh> hexadecimal numeric string
+ *  <xh> hexadecimal byte string
  *  <xq> standard quoted strings
  *  <xqs> quote stop (detect continued strings)
  *  <xe> extended quoted strings (support backslash escape sequences)
@@ -262,7 +262,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -341,7 +341,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -380,15 +379,18 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
  * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+digit			[0-9]
 
 integer			{digit}+
 decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0fab48a382..9aac166aa0 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -112,7 +112,7 @@ extern void psql_yyset_column(int column_no, yyscan_t yyscanner);
  *  <xb> bit string literal
  *  <xc> extended C-style comments
  *  <xd> delimited identifiers (double-quoted identifiers)
- *  <xh> hexadecimal numeric string
+ *  <xh> hexadecimal byte string
  *  <xq> standard quoted strings
  *  <xqs> quote stop (detect continued strings)
  *  <xe> extended quoted strings (support backslash escape sequences)
@@ -200,7 +200,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -279,7 +279,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -318,15 +317,18 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
  * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+digit			[0-9]
 
 integer			{digit}+
 decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 7a0356638d..7c3bf52bfa 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -130,7 +130,7 @@ static struct _if_value
  *  <xc> extended C-style comments
  *  <xd> delimited identifiers (double-quoted identifiers)
  *  <xdc> double-quoted strings in C
- *  <xh> hexadecimal numeric string
+ *  <xh> hexadecimal byte string
  *  <xn> national character quoted strings
  *  <xq> standard quoted strings
  *  <xqs> quote stop (detect continued strings)
@@ -223,7 +223,7 @@ quotecontinuefail	{whitespace}*"-"?
 xbstart			[bB]{quote}
 xbinside		[^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart			[xX]{quote}
 xhinside		[^']*
 
@@ -305,7 +305,6 @@ xcstart			\/\*{op_chars}*
 xcstop			\*+\/
 xcinside		[^*/]+
 
-digit			[0-9]
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
 
@@ -346,15 +345,18 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
  * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+digit			[0-9]
 
 integer			{digit}+
 decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
@@ -603,7 +605,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return BCONST;
 						case xh:
 							if (literalbuf[strspn(literalbuf, "0123456789abcdefABCDEF")] != '\0')
-								mmerror(PARSE_ERROR, ET_ERROR, "invalid hex string literal");
+								mmerror(PARSE_ERROR, ET_ERROR, "invalid hexadecimal string literal");
 							base_yylval.str = psprintf("x'%s'", literalbuf);
 							return XCONST;
 						case xq:
-- 
2.33.1

v5-0002-Remove-unused-includes.patchtext/plain; charset=UTF-8; name=v5-0002-Remove-unused-includes.patchDownload

From ba0b3390a82901a3dad52267a7b1e36cc7be50b1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 24 Nov 2021 12:30:23 +0100
Subject: [PATCH v5 2/6] Remove unused includes

These haven't been needed for a long time.
---
 src/backend/utils/adt/cash.c       | 1 -
 src/backend/utils/adt/formatting.c | 1 -
 src/backend/utils/adt/numeric.c    | 1 -
 src/backend/utils/adt/rangetypes.c | 1 -
 4 files changed, 4 deletions(-)

diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index d093ce8038..f7e78fa105 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -26,7 +26,6 @@
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/cash.h"
-#include "utils/int8.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index a1145e2721..419469fab5 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -92,7 +92,6 @@
 #include "utils/datetime.h"
 #include "utils/float.h"
 #include "utils/formatting.h"
-#include "utils/int8.h"
 #include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 1de744855f..644d7d3d21 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -39,7 +39,6 @@
 #include "utils/builtins.h"
 #include "utils/float.h"
 #include "utils/guc.h"
-#include "utils/int8.h"
 #include "utils/numeric.h"
 #include "utils/pg_lsn.h"
 #include "utils/sortsupport.h"
diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 815175a654..6c23d02c46 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -37,7 +37,6 @@
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/date.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/rangetypes.h"
 #include "utils/timestamp.h"
-- 
2.33.1

v5-0003-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v5-0003-Move-scanint8-to-numutils.c.patchDownload

From ce0109193e49ef3cda84415ed5a117afd3c47521 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 24 Nov 2021 12:31:07 +0100
Subject: [PATCH v5 3/6] Move scanint8() to numutils.c

Move scanint8() to numutils.c and rename to pg_strtoint64().  We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent.  The API is also changed to no longer provide the errorOK
case.  Instead, provide another function for such uses that just wraps
the OS's strtoll() or similar.  We already have such a function for
the strtoull() variant, which is called pg_strtouint64(), so this
version will be called pg_strtoint64(), except we already used that
name above. FIXME
---
 src/backend/parser/parse_node.c             | 12 ++-
 src/backend/replication/pgoutput/pgoutput.c |  9 +-
 src/backend/utils/adt/int8.c                | 90 +------------------
 src/backend/utils/adt/numutils.c            | 97 +++++++++++++++++++++
 src/bin/pgbench/pgbench.c                   |  4 +-
 src/include/utils/builtins.h                |  2 +
 src/include/utils/int8.h                    | 25 ------
 7 files changed, 117 insertions(+), 122 deletions(-)
 delete mode 100644 src/include/utils/int8.h

diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 8cfe6f67c0..bb82439dc9 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -26,7 +26,6 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
@@ -353,7 +352,6 @@ make_const(ParseState *pstate, A_Const *aconst)
 {
 	Const	   *con;
 	Datum		val;
-	int64		val64;
 	Oid			typeid;
 	int			typelen;
 	bool		typebyval;
@@ -384,8 +382,15 @@ make_const(ParseState *pstate, A_Const *aconst)
 			break;
 
 		case T_Float:
+		{
 			/* could be an oversize integer as well as a float ... */
-			if (scanint8(aconst->val.fval.val, true, &val64))
+
+			int64		val64;
+			char	   *endptr;
+
+			errno = 0;
+			val64 = pg_strtoint64xx(aconst->val.fval.val, &endptr, 10);
+			if (!errno && *endptr == '\0')
 			{
 				/*
 				 * It might actually fit in int32. Probably only INT_MIN can
@@ -425,6 +430,7 @@ make_const(ParseState *pstate, A_Const *aconst)
 				typebyval = false;
 			}
 			break;
+		}
 
 		case T_String:
 
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 6f6a203dea..2f0f40c75d 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -21,7 +21,6 @@
 #include "replication/logicalproto.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
-#include "utils/int8.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -205,7 +204,8 @@ parse_output_parameters(List *options, PGOutputData *data)
 		/* Check each param, whether or not we recognize it */
 		if (strcmp(defel->defname, "proto_version") == 0)
 		{
-			int64		parsed;
+			unsigned long parsed;
+			char	   *endptr;
 
 			if (protocol_version_given)
 				ereport(ERROR,
@@ -213,12 +213,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 						 errmsg("conflicting or redundant options")));
 			protocol_version_given = true;
 
-			if (!scanint8(strVal(defel->arg), true, &parsed))
+			parsed = strtoul(strVal(defel->arg), &endptr, 10);
+			if (errno || *endptr != '\0')
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("invalid proto_version")));
 
-			if (parsed > PG_UINT32_MAX || parsed < 0)
+			if (parsed > PG_UINT32_MAX)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("proto_version \"%s\" out of range",
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 2168080dcc..f8f557526f 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -24,7 +24,6 @@
 #include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 
 
 typedef struct
@@ -45,99 +44,14 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
-/*
- * scanint8 --- try to parse a string into an int8.
- *
- * If errorOK is false, ereport a useful error message if the string is bad.
- * If errorOK is true, just return "false" for bad input.
- */
-bool
-scanint8(const char *str, bool errorOK, int64 *result)
-{
-	const char *ptr = str;
-	int64		tmp = 0;
-	bool		neg = false;
-
-	/*
-	 * Do our own scan, rather than relying on sscanf which might be broken
-	 * for long long.
-	 *
-	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
-	 * value as a negative number.
-	 */
-
-	/* skip leading spaces */
-	while (*ptr && isspace((unsigned char) *ptr))
-		ptr++;
-
-	/* handle sign */
-	if (*ptr == '-')
-	{
-		ptr++;
-		neg = true;
-	}
-	else if (*ptr == '+')
-		ptr++;
-
-	/* require at least one digit */
-	if (unlikely(!isdigit((unsigned char) *ptr)))
-		goto invalid_syntax;
-
-	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
-	{
-		int8		digit = (*ptr++ - '0');
-
-		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
-	}
-
-	/* allow trailing whitespace, but not other trailing chars */
-	while (*ptr != '\0' && isspace((unsigned char) *ptr))
-		ptr++;
-
-	if (unlikely(*ptr != '\0'))
-		goto invalid_syntax;
-
-	if (!neg)
-	{
-		/* could fail if input is most negative number */
-		if (unlikely(tmp == PG_INT64_MIN))
-			goto out_of_range;
-		tmp = -tmp;
-	}
-
-	*result = tmp;
-	return true;
-
-out_of_range:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("value \"%s\" is out of range for type %s",
-						str, "bigint")));
-	return false;
-
-invalid_syntax:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"bigint", str)));
-	return false;
-}
-
 /* int8in()
  */
 Datum
 int8in(PG_FUNCTION_ARGS)
 {
-	char	   *str = PG_GETARG_CSTRING(0);
-	int64		result;
+	char	   *num = PG_GETARG_CSTRING(0);
 
-	(void) scanint8(str, false, &result);
-	PG_RETURN_INT64(result);
+	PG_RETURN_INT64(pg_strtoint64(num));
 }
 
 
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index b93096f288..ebc2d222a3 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -325,6 +325,90 @@ pg_strtoint32(const char *s)
 	return 0;					/* keep compiler quiet */
 }
 
+/*
+ * Convert input string to a signed 64 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. Will throw
+ * ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int64
+pg_strtoint64(const char *s)
+{
+	const char *ptr = s;
+	int64		tmp = 0;
+	bool		neg = false;
+
+	/*
+	 * Do our own scan, rather than relying on sscanf which might be broken
+	 * for long long.
+	 *
+	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
+	 * value as a negative number.
+	 */
+
+	/* skip leading spaces */
+	while (*ptr && isspace((unsigned char) *ptr))
+		ptr++;
+
+	/* handle sign */
+	if (*ptr == '-')
+	{
+		ptr++;
+		neg = true;
+	}
+	else if (*ptr == '+')
+		ptr++;
+
+	/* require at least one digit */
+	if (unlikely(!isdigit((unsigned char) *ptr)))
+		goto invalid_syntax;
+
+	/* process digits */
+	while (*ptr && isdigit((unsigned char) *ptr))
+	{
+		int8		digit = (*ptr++ - '0');
+
+		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
+			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+			goto out_of_range;
+	}
+
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*ptr != '\0' && isspace((unsigned char) *ptr))
+		ptr++;
+
+	if (unlikely(*ptr != '\0'))
+		goto invalid_syntax;
+
+	if (!neg)
+	{
+		/* could fail if input is most negative number */
+		if (unlikely(tmp == PG_INT64_MIN))
+			goto out_of_range;
+		tmp = -tmp;
+	}
+
+	return tmp;
+
+out_of_range:
+	ereport(ERROR,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value \"%s\" is out of range for type %s",
+					s, "bigint")));
+
+invalid_syntax:
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"bigint", s)));
+
+	return 0;					/* keep compiler quiet */
+}
+
 /*
  * pg_itoa: converts a signed 16-bit integer to its string representation
  * and returns strlen(a).
@@ -628,3 +712,16 @@ pg_strtouint64(const char *str, char **endptr, int base)
 	return strtoul(str, endptr, base);
 #endif
 }
+
+// XXX unfortunate API naming conflict
+int64
+pg_strtoint64xx(const char *str, char **endptr, int base)
+{
+#ifdef _MSC_VER					/* MSVC only */
+	return _strtoi64(str, endptr, base);
+#elif defined(HAVE_STRTOLL) && SIZEOF_LONG < 8
+	return strtoll(str, endptr, base);
+#else
+	return strtol(str, endptr, base);
+#endif
+}
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c12b6f0615..026c5e3083 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -794,8 +794,8 @@ is_an_int(const char *str)
 /*
  * strtoint64 -- convert a string to 64-bit integer
  *
- * This function is a slightly modified version of scanint8() from
- * src/backend/utils/adt/int8.c.
+ * This function is a slightly modified version of pg_strtoint64() from
+ * src/backend/utils/adt/numutils.c.
  *
  * The function returns whether the conversion worked, and if so
  * "*result" is set to the result.
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 40fcb0ab6d..e8b2abace9 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -46,6 +46,7 @@ extern int	namestrcmp(Name name, const char *str);
 extern int32 pg_atoi(const char *s, int size, int c);
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
+extern int64 pg_strtoint64(const char *s);
 extern int	pg_itoa(int16 i, char *a);
 extern int	pg_ultoa_n(uint32 l, char *a);
 extern int	pg_ulltoa_n(uint64 l, char *a);
@@ -54,6 +55,7 @@ extern int	pg_lltoa(int64 ll, char *a);
 extern char *pg_ultostr_zeropad(char *str, uint32 value, int32 minwidth);
 extern char *pg_ultostr(char *str, uint32 value);
 extern uint64 pg_strtouint64(const char *str, char **endptr, int base);
+extern int64 pg_strtoint64xx(const char *str, char **endptr, int base);
 
 /* oid.c */
 extern oidvector *buildoidvector(const Oid *oids, int n);
diff --git a/src/include/utils/int8.h b/src/include/utils/int8.h
deleted file mode 100644
index 6571188f90..0000000000
--- a/src/include/utils/int8.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * int8.h
- *	  Declarations for operations on 64-bit integers.
- *
- *
- * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/include/utils/int8.h
- *
- * NOTES
- * These data types are supported on all 64-bit architectures, and may
- *	be supported through libraries on some 32-bit machines. If your machine
- *	is not currently supported, then please try to make it so, then post
- *	patches to the postgresql.org hackers mailing list.
- *
- *-------------------------------------------------------------------------
- */
-#ifndef INT8_H
-#define INT8_H
-
-extern bool scanint8(const char *str, bool errorOK, int64 *result);
-
-#endif							/* INT8_H */
-- 
2.33.1

v5-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v5-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload

From eef02792b933a48634fcea2f3425d32e8a7e19d4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 25 Nov 2021 07:44:32 +0100
Subject: [PATCH v5 4/6] Add test case for trailing junk after numeric literals

PostgreSQL currently accepts numeric literals with trailing
non-digits, such as 123abc where the abc is treated as the next token.
This may be a bit surprising.  This commit adds test cases for this;
subsequent commits intend to change this behavior.
---
 src/test/regress/expected/numerology.out | 55 ++++++++++++++++++++++++
 src/test/regress/sql/numerology.sql      | 14 ++++++
 2 files changed, 69 insertions(+)

diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 44d6c435de..32c6d80c03 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,6 +3,61 @@
 -- Test various combinations of numeric types and functions.
 --
 --
+-- Trailing junk in numeric literals
+--
+SELECT 123abc;
+ abc 
+-----
+ 123
+(1 row)
+
+SELECT 0x0o;
+ x0o 
+-----
+   0
+(1 row)
+
+SELECT 1_2_3;
+ _2_3 
+------
+    1
+(1 row)
+
+SELECT 0.a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT .0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e1a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0e;
+  e  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e+a;
+ERROR:  syntax error at or near "+"
+LINE 1: SELECT 0.0e+a;
+                   ^
+--
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
 --  so let's try explicit conversions for now - tgl 97/05/07
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index fddb58f8fd..70447a95fa 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,6 +3,20 @@
 -- Test various combinations of numeric types and functions.
 --
 
+--
+-- Trailing junk in numeric literals
+--
+
+SELECT 123abc;
+SELECT 0x0o;
+SELECT 1_2_3;
+SELECT 0.a;
+SELECT 0.0a;
+SELECT .0a;
+SELECT 0.0e1a;
+SELECT 0.0e;
+SELECT 0.0e+a;
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.33.1

v5-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v5-0005-Reject-trailing-junk-after-numeric-literals.patchDownload

From 278f30528fa455c16be86b4c8c6b8fe7d9ad26dc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 25 Nov 2021 08:41:48 +0100
Subject: [PATCH v5 5/6] Reject trailing junk after numeric literals

After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc.  This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.
---
 src/backend/parser/scan.l                | 27 ++++++----
 src/fe_utils/psqlscan.l                  | 21 +++++---
 src/interfaces/ecpg/preproc/pgc.l        |  4 ++
 src/test/regress/expected/numerology.out | 68 +++++++++---------------
 4 files changed, 61 insertions(+), 59 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 4e02815803..42646171e5 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -399,6 +399,10 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
 
 other			.
@@ -996,19 +1000,24 @@ other			.
 					return FCONST;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 */
-					yyless(yyleng - 1);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
+				}
+{integer_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{decimal_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{real_junk}		{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
 				}
 
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 9aac166aa0..4cd5e69d00 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -337,6 +337,10 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
 
 /* psql-specific: characters allowed in variable names */
@@ -856,17 +860,18 @@ other			.
 					ECHO;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 * (in psql, we don't actually care...)
-					 */
-					yyless(yyleng - 1);
 					ECHO;
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
+					ECHO;
+				}
+{integer_junk}	{
+					ECHO;
+				}
+{decimal_junk}	{
+					ECHO;
+				}
+{real_junk}		{
 					ECHO;
 				}
 
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 7c3bf52bfa..e641095496 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
 
 /* special characters for other dbms */
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 32c6d80c03..2f176ccb52 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,57 +6,41 @@
 -- Trailing junk in numeric literals
 --
 SELECT 123abc;
- abc 
------
- 123
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+               ^
 SELECT 0x0o;
- x0o 
------
-   0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+               ^
 SELECT 1_2_3;
- _2_3 
-------
-    1
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+               ^
 SELECT 0.a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+               ^
 SELECT 0.0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+               ^
 SELECT .0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+               ^
 SELECT 0.0e1a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+               ^
 SELECT 0.0e;
-  e  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+               ^
 SELECT 0.0e+a;
-ERROR:  syntax error at or near "+"
+ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
-                   ^
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.33.1

v5-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v5-0006-Non-decimal-integer-literals.patchDownload

From d9936e79d59e4971aedbe972346de337ac114678 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Sun, 31 Oct 2021 15:42:18 +0100
Subject: [PATCH v5 6/6] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  26 ++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/scan.l                  |  96 ++++++++++----
 src/backend/utils/adt/numutils.c           | 140 +++++++++++++++++++++
 src/fe_utils/psqlscan.l                    |  78 +++++++++---
 src/interfaces/ecpg/preproc/pgc.l          |  93 +++++++-------
 src/test/regress/expected/int2.out         |  19 +++
 src/test/regress/expected/int4.out         |  19 +++
 src/test/regress/expected/int8.out         |  19 +++
 src/test/regress/expected/numerology.out   |  59 ++++++++-
 src/test/regress/sql/int2.sql              |   7 ++
 src/test/regress/sql/int4.sql              |   7 ++
 src/test/regress/sql/int8.sql              |   7 ++
 src/test/regress/sql/numerology.sql        |  21 +++-
 15 files changed, 508 insertions(+), 90 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 11d9dd60c2..ce88c483a2 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 42646171e5..cde870b463 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,25 +385,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
+param			\${decinteger}
 
 other			.
 
@@ -979,20 +994,41 @@ other			.
 					return PARAM;
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail}		{
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1007,11 +1043,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1307,17 +1355,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index ebc2d222a3..776612874f 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -173,6 +173,17 @@ pg_atoi(const char *s, int size, int c)
 	return (int32) l;
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -208,6 +219,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -216,6 +269,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -284,6 +338,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -292,6 +388,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -368,6 +465,48 @@ pg_strtoint64(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -376,6 +515,7 @@ pg_strtoint64(const char *s)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 4cd5e69d00..4c95eb60e8 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,25 +323,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
+param			\${decinteger}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -845,13 +860,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -865,10 +898,19 @@ other			.
 {realfail2}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index e641095496..fde211a33f 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,25 +351,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
+param			\${decinteger}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -399,9 +414,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -414,7 +426,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -929,17 +941,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
+				}
+{hexinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 16);
 				}
-{decimal}		{
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -948,18 +963,25 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
+<SQL>{octinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 8);
+				}
+<SQL>{bininteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 2);
+				}
+
 <SQL>{
 :{identifier}((("->"|\.){identifier})|(\[{array}\]))*	{
 					base_yylval.str = mm_strdup(yytext+1);
@@ -1015,19 +1037,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *)yytext,&endptr,16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1552,17 +1561,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..220e1493e8 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..6fdbd58b40 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,22 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..edd15a4353 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 2f176ccb52..5313b47d3b 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,33 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -41,6 +60,42 @@ SELECT 0.0e+a;
 ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
                ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "SELECT 0b"
+LINE 1: SELECT 0b;
+        ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "SELECT 0o"
+LINE 1: SELECT 0o;
+        ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "SELECT 0x"
+LINE 1: SELECT 0x;
+        ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..0dee22fe6d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..2a69b1614e 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,10 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..b7ad696dd8 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index 70447a95fa..fd7e02e536 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,16 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -17,6 +23,19 @@
 SELECT 0.0e;
 SELECT 0.0e+a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.33.1

#12

Zhihong Yu

zyu@yugabyte.com

about 4 years ago

In reply to: Peter Eisentraut (#11)

Re: Non-decimal integer literals

On Thu, Nov 25, 2021 at 5:18 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 01.11.21 07:09, Peter Eisentraut wrote:

Here is an updated patch for this. It's the previous patch polished a
bit more, and it contains changes so that numeric literals reject
trailing identifier parts without whitespace in between, as discussed.
Maybe I should split that into incremental patches, but for now I only
have the one. I don't have a patch for the underscores in numeric
literals yet. It's in progress, but not ready.

Here is a progressed version of this work, split into more incremental
patches. The first three patches are harmless code cleanups. Patch 3
has an interesting naming conflict, noted in the commit message; ideas
welcome. Patches 4 and 5 handle the rejection of trailing junk after
numeric literals, as discussed. I have expanded that compared to the v4
patch to also cover non-integer literals. It also comes with more tests
now. Patch 6 is the titular introduction of non-decimal integer
literals, unchanged from before.

Hi,
For patch 3,

+int64
+pg_strtoint64(const char *s)

How about naming the above function pg_scanint64()?
pg_strtoint64xx() can be named pg_strtoint64() - this would align with
existing function:

pg_strtouint64(const char *str, char **endptr, int base)

Cheers

#13

John Naylor

john.naylor@enterprisedb.com

about 4 years ago

In reply to: Zhihong Yu (#12)

Re: Non-decimal integer literals

Hi Peter,

0001

-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it
separately to
+ * parser, and there it gets coerced via doNegate().

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now, if you
like. I don't have any good ideas about 0003 at the moment.

0005

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1 ({integer}|{decimal})[Ee]
 realfail2 ({integer}|{decimal})[Ee][-+]

+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
  }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't match
other failure states:

+SELECT 0b;
+ERROR:  invalid binary integer at or near "SELECT 0b"
+LINE 1: SELECT 0b;
+        ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^

We might consider some tests for ECPG since lack of coverage has been a
problem.

Also, I'm curious: how does the spec work as far as deciding the year of
release, or feature-freezing of new items?
--
John Naylor
EDB: http://www.enterprisedb.com

#14

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Zhihong Yu (#12)

Re: Non-decimal integer literals

On 25.11.21 16:46, Zhihong Yu wrote:

For patch 3,

+int64
+pg_strtoint64(const char *s)

How about naming the above function pg_scanint64()?
pg_strtoint64xx() can be named pg_strtoint64() - this would align with
existing function:

pg_strtouint64(const char *str, char **endptr, int base)

That would be one way. But the existing pg_strtointNN() functions are
pretty widely used, so I would tend toward finding another name for the
less used pg_strtouint64(), maybe pg_strtouint64x() ("extended").

#15

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: John Naylor (#13)

Re: Non-decimal integer literals

On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now, if
you like.

done

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1 ({integer}|{decimal})[Ee]
 realfail2 ({integer}|{decimal})[Ee][-+]

+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

Yeah, it's a bit weird that not all the symbols are used in ecpg. I'll
look into explaining this better.

0006
+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
  }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }
It seems these could use SET_YYLLOC(), since the error cursor doesn't
match other failure states:

We might consider some tests for ECPG since lack of coverage has been a
problem.

right

Also, I'm curious: how does the spec work as far as deciding the year of
release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1. But the preparation work (up to v5-0005, and some other number
parsing refactoring that I'm seeing) could be considered for PG15.

I'll move this to the next CF and come back with an updated patch set in
a little while.

#16

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Peter Eisentraut (#15)

7 attachment(s)

Re: Non-decimal integer literals

There has been some other refactoring going on, which made this patch
set out of date. So here is an update.

The old pg_strtouint64() has been removed, so there is no longer a
naming concern with patch 0001. That one should be good to go.

I also found that yet another way to parse integers in pg_atoi() has
mostly faded away in utility, so I removed the last two callers and
removed the function in 0002 and 0003.

The remaining patches are as before, with some of the review comments
applied. I still need to write some lexing unit tests for ecpg, which I
haven't gotten to yet. This affects patches 0004 and 0005.

As mentioned before, patches 0006 and 0007 are more feature previews at
this point.

Show quoted text

On 01.12.21 16:47, Peter Eisentraut wrote:

On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now, if
you like.

done
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}
A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.
Yeah, it's a bit weird that not all the symbols are used in ecpg. I'll
look into explaining this better.
0006
+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }
It seems these could use SET_YYLLOC(), since the error cursor doesn't
match other failure states:
ok

We might consider some tests for ECPG since lack of coverage has been
a problem.

right

Also, I'm curious: how does the spec work as far as deciding the year
of release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1. But the preparation work (up to v5-0005, and some other number
parsing refactoring that I'm seeing) could be considered for PG15.

I'll move this to the next CF and come back with an updated patch set in
a little while.

Attachments:

v6-0001-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v6-0001-Move-scanint8-to-numutils.c.patchDownload

From 4aa1329c3aad512f33a56a05fcc465793ef19b1d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 1/7] Move scanint8() to numutils.c

Move scanint8() to numutils.c and rename to pg_strtoint64().  We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent.  The API is also changed to no longer provide the errorOK
case.  Users that need the error checking can use strtoi64().

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/parse_node.c             | 12 ++-
 src/backend/replication/pgoutput/pgoutput.c |  9 ++-
 src/backend/utils/adt/int8.c                | 90 +--------------------
 src/backend/utils/adt/numutils.c            | 84 +++++++++++++++++++
 src/bin/pgbench/pgbench.c                   |  4 +-
 src/include/utils/builtins.h                |  1 +
 src/include/utils/int8.h                    | 25 ------
 7 files changed, 103 insertions(+), 122 deletions(-)
 delete mode 100644 src/include/utils/int8.h

diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 8cfe6f67c0..0eefd5427a 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -26,7 +26,6 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
@@ -353,7 +352,6 @@ make_const(ParseState *pstate, A_Const *aconst)
 {
 	Const	   *con;
 	Datum		val;
-	int64		val64;
 	Oid			typeid;
 	int			typelen;
 	bool		typebyval;
@@ -384,8 +382,15 @@ make_const(ParseState *pstate, A_Const *aconst)
 			break;
 
 		case T_Float:
+		{
 			/* could be an oversize integer as well as a float ... */
-			if (scanint8(aconst->val.fval.val, true, &val64))
+
+			int64		val64;
+			char	   *endptr;
+
+			errno = 0;
+			val64 = strtoi64(aconst->val.fval.val, &endptr, 10);
+			if (errno == 0 && *endptr == '\0')
 			{
 				/*
 				 * It might actually fit in int32. Probably only INT_MIN can
@@ -425,6 +430,7 @@ make_const(ParseState *pstate, A_Const *aconst)
 				typebyval = false;
 			}
 			break;
+		}
 
 		case T_String:
 
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 6f6a203dea..2f0f40c75d 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -21,7 +21,6 @@
 #include "replication/logicalproto.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
-#include "utils/int8.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -205,7 +204,8 @@ parse_output_parameters(List *options, PGOutputData *data)
 		/* Check each param, whether or not we recognize it */
 		if (strcmp(defel->defname, "proto_version") == 0)
 		{
-			int64		parsed;
+			unsigned long parsed;
+			char	   *endptr;
 
 			if (protocol_version_given)
 				ereport(ERROR,
@@ -213,12 +213,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 						 errmsg("conflicting or redundant options")));
 			protocol_version_given = true;
 
-			if (!scanint8(strVal(defel->arg), true, &parsed))
+			parsed = strtoul(strVal(defel->arg), &endptr, 10);
+			if (errno || *endptr != '\0')
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("invalid proto_version")));
 
-			if (parsed > PG_UINT32_MAX || parsed < 0)
+			if (parsed > PG_UINT32_MAX)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("proto_version \"%s\" out of range",
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 2168080dcc..f8f557526f 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -24,7 +24,6 @@
 #include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 
 
 typedef struct
@@ -45,99 +44,14 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
-/*
- * scanint8 --- try to parse a string into an int8.
- *
- * If errorOK is false, ereport a useful error message if the string is bad.
- * If errorOK is true, just return "false" for bad input.
- */
-bool
-scanint8(const char *str, bool errorOK, int64 *result)
-{
-	const char *ptr = str;
-	int64		tmp = 0;
-	bool		neg = false;
-
-	/*
-	 * Do our own scan, rather than relying on sscanf which might be broken
-	 * for long long.
-	 *
-	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
-	 * value as a negative number.
-	 */
-
-	/* skip leading spaces */
-	while (*ptr && isspace((unsigned char) *ptr))
-		ptr++;
-
-	/* handle sign */
-	if (*ptr == '-')
-	{
-		ptr++;
-		neg = true;
-	}
-	else if (*ptr == '+')
-		ptr++;
-
-	/* require at least one digit */
-	if (unlikely(!isdigit((unsigned char) *ptr)))
-		goto invalid_syntax;
-
-	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
-	{
-		int8		digit = (*ptr++ - '0');
-
-		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
-	}
-
-	/* allow trailing whitespace, but not other trailing chars */
-	while (*ptr != '\0' && isspace((unsigned char) *ptr))
-		ptr++;
-
-	if (unlikely(*ptr != '\0'))
-		goto invalid_syntax;
-
-	if (!neg)
-	{
-		/* could fail if input is most negative number */
-		if (unlikely(tmp == PG_INT64_MIN))
-			goto out_of_range;
-		tmp = -tmp;
-	}
-
-	*result = tmp;
-	return true;
-
-out_of_range:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("value \"%s\" is out of range for type %s",
-						str, "bigint")));
-	return false;
-
-invalid_syntax:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"bigint", str)));
-	return false;
-}
-
 /* int8in()
  */
 Datum
 int8in(PG_FUNCTION_ARGS)
 {
-	char	   *str = PG_GETARG_CSTRING(0);
-	int64		result;
+	char	   *num = PG_GETARG_CSTRING(0);
 
-	(void) scanint8(str, false, &result);
-	PG_RETURN_INT64(result);
+	PG_RETURN_INT64(pg_strtoint64(num));
 }
 
 
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 6a9c00fdd3..7ac7e5dbd3 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -325,6 +325,90 @@ pg_strtoint32(const char *s)
 	return 0;					/* keep compiler quiet */
 }
 
+/*
+ * Convert input string to a signed 64 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. Will throw
+ * ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int64
+pg_strtoint64(const char *s)
+{
+	const char *ptr = s;
+	int64		tmp = 0;
+	bool		neg = false;
+
+	/*
+	 * Do our own scan, rather than relying on sscanf which might be broken
+	 * for long long.
+	 *
+	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
+	 * value as a negative number.
+	 */
+
+	/* skip leading spaces */
+	while (*ptr && isspace((unsigned char) *ptr))
+		ptr++;
+
+	/* handle sign */
+	if (*ptr == '-')
+	{
+		ptr++;
+		neg = true;
+	}
+	else if (*ptr == '+')
+		ptr++;
+
+	/* require at least one digit */
+	if (unlikely(!isdigit((unsigned char) *ptr)))
+		goto invalid_syntax;
+
+	/* process digits */
+	while (*ptr && isdigit((unsigned char) *ptr))
+	{
+		int8		digit = (*ptr++ - '0');
+
+		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
+			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+			goto out_of_range;
+	}
+
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*ptr != '\0' && isspace((unsigned char) *ptr))
+		ptr++;
+
+	if (unlikely(*ptr != '\0'))
+		goto invalid_syntax;
+
+	if (!neg)
+	{
+		/* could fail if input is most negative number */
+		if (unlikely(tmp == PG_INT64_MIN))
+			goto out_of_range;
+		tmp = -tmp;
+	}
+
+	return tmp;
+
+out_of_range:
+	ereport(ERROR,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value \"%s\" is out of range for type %s",
+					s, "bigint")));
+
+invalid_syntax:
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"bigint", s)));
+
+	return 0;					/* keep compiler quiet */
+}
+
 /*
  * pg_itoa: converts a signed 16-bit integer to its string representation
  * and returns strlen(a).
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ea9639984c..6b5f8bc071 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -787,8 +787,8 @@ is_an_int(const char *str)
 /*
  * strtoint64 -- convert a string to 64-bit integer
  *
- * This function is a slightly modified version of scanint8() from
- * src/backend/utils/adt/int8.c.
+ * This function is a slightly modified version of pg_strtoint64() from
+ * src/backend/utils/adt/numutils.c.
  *
  * The function returns whether the conversion worked, and if so
  * "*result" is set to the result.
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index b07eefaf1e..1ef8359906 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -46,6 +46,7 @@ extern int	namestrcmp(Name name, const char *str);
 extern int32 pg_atoi(const char *s, int size, int c);
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
+extern int64 pg_strtoint64(const char *s);
 extern int	pg_itoa(int16 i, char *a);
 extern int	pg_ultoa_n(uint32 l, char *a);
 extern int	pg_ulltoa_n(uint64 l, char *a);
diff --git a/src/include/utils/int8.h b/src/include/utils/int8.h
deleted file mode 100644
index 6571188f90..0000000000
--- a/src/include/utils/int8.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * int8.h
- *	  Declarations for operations on 64-bit integers.
- *
- *
- * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/include/utils/int8.h
- *
- * NOTES
- * These data types are supported on all 64-bit architectures, and may
- *	be supported through libraries on some 32-bit machines. If your machine
- *	is not currently supported, then please try to make it so, then post
- *	patches to the postgresql.org hackers mailing list.
- *
- *-------------------------------------------------------------------------
- */
-#ifndef INT8_H
-#define INT8_H
-
-extern bool scanint8(const char *str, bool errorOK, int64 *result);
-
-#endif							/* INT8_H */

base-commit: 8112bcf0cc602e00e95eab6c4bdc0eb73b5b547d
-- 
2.34.1

v6-0002-Remove-one-use-of-pg_atoi.patchtext/plain; charset=UTF-8; name=v6-0002-Remove-one-use-of-pg_atoi.patchDownload

From 4651d0b09e9dcac554efba099a27c94748b33ccb Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 2/7] Remove one use of pg_atoi()

There was no real need to use this here instead of a simpler API.
---
 src/backend/utils/adt/jsonpath_gram.y | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index bd5d4488a0..5982672558 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -232,7 +232,7 @@ array_accessor:
 	;
 
 any_level:
-	INT_P							{ $$ = pg_atoi($1.val, 4, 0); }
+	INT_P							{ $$ = pg_strtoint32($1.val); }
 	| LAST_P						{ $$ = -1; }
 	;
 
-- 
2.34.1

v6-0003-Remove-pg_atoi.patchtext/plain; charset=UTF-8; name=v6-0003-Remove-pg_atoi.patchDownload

From e32f1eed77d8040e2d79e5251b3c8f897dbeb223 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 3/7] Remove pg_atoi()

The last caller was int2vectorin(), and having such a general function
for one user didn't seem useful, so just put the required parts inline
and remove the function.
---
 src/backend/utils/adt/int.c      | 32 ++++++++++--
 src/backend/utils/adt/numutils.c | 88 --------------------------------
 src/include/utils/builtins.h     |  1 -
 3 files changed, 28 insertions(+), 93 deletions(-)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index e9f108425c..ed2a9016f5 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -146,15 +146,39 @@ int2vectorin(PG_FUNCTION_ARGS)
 
 	result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS));
 
-	for (n = 0; *intString && n < FUNC_MAX_ARGS; n++)
+	for (n = 0; n < FUNC_MAX_ARGS; n++)
 	{
+		long		l;
+		char	   *endp;
+
 		while (*intString && isspace((unsigned char) *intString))
 			intString++;
 		if (*intString == '\0')
 			break;
-		result->values[n] = pg_atoi(intString, sizeof(int16), ' ');
-		while (*intString && !isspace((unsigned char) *intString))
-			intString++;
+
+		errno = 0;
+		l = strtol(intString, &endp, 10);
+
+		if (intString == endp)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s: \"%s\"",
+							"smallint", intString)));
+
+		if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
+			ereport(ERROR,
+					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					 errmsg("value \"%s\" is out of range for type %s", intString,
+							"smallint")));
+
+		if (*endp && *endp != ' ')
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s: \"%s\"",
+							"integer", intString)));
+
+		result->values[n] = l;
+		intString = endp;
 	}
 	while (*intString && isspace((unsigned char) *intString))
 		intString++;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 7ac7e5dbd3..18de54da40 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,94 +85,6 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
-/*
- * pg_atoi: convert string to integer
- *
- * allows any number of leading or trailing whitespace characters.
- *
- * 'size' is the sizeof() the desired integral result (1, 2, or 4 bytes).
- *
- * c, if not 0, is a terminator character that may appear after the
- * integer (plus whitespace).  If 0, the string must end after the integer.
- *
- * Unlike plain atoi(), this will throw ereport() upon bad input format or
- * overflow.
- */
-int32
-pg_atoi(const char *s, int size, int c)
-{
-	long		l;
-	char	   *badp;
-
-	/*
-	 * Some versions of strtol treat the empty string as an error, but some
-	 * seem not to.  Make an explicit test to be sure we catch it.
-	 */
-	if (s == NULL)
-		elog(ERROR, "NULL pointer");
-	if (*s == 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	errno = 0;
-	l = strtol(s, &badp, 10);
-
-	/* We made no progress parsing the string, so bail out */
-	if (s == badp)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	switch (size)
-	{
-		case sizeof(int32):
-			if (errno == ERANGE
-#if defined(HAVE_LONG_INT_64)
-			/* won't get ERANGE on these with 64-bit longs... */
-				|| l < INT_MIN || l > INT_MAX
-#endif
-				)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for type %s", s,
-								"integer")));
-			break;
-		case sizeof(int16):
-			if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for type %s", s,
-								"smallint")));
-			break;
-		case sizeof(int8):
-			if (errno == ERANGE || l < SCHAR_MIN || l > SCHAR_MAX)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for 8-bit integer", s)));
-			break;
-		default:
-			elog(ERROR, "unsupported result size: %d", size);
-	}
-
-	/*
-	 * Skip any trailing whitespace; if anything but whitespace remains before
-	 * the terminating character, bail out
-	 */
-	while (*badp && *badp != c && isspace((unsigned char) *badp))
-		badp++;
-
-	if (*badp && *badp != c)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	return (int32) l;
-}
-
 /*
  * Convert input string to a signed 16 bit integer.
  *
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 1ef8359906..60339d3dcf 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -43,7 +43,6 @@ extern void namestrcpy(Name name, const char *str);
 extern int	namestrcmp(Name name, const char *str);
 
 /* numutils.c */
-extern int32 pg_atoi(const char *s, int size, int c);
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
 extern int64 pg_strtoint64(const char *s);
-- 
2.34.1

v6-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v6-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload

From fb606a29ba7ae45e8dcf84a4be2f39aa5a54a648 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 4/7] Add test case for trailing junk after numeric literals

PostgreSQL currently accepts numeric literals with trailing
non-digits, such as 123abc where the abc is treated as the next token.
This may be a bit surprising.  This commit adds test cases for this;
subsequent commits intend to change this behavior.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/test/regress/expected/numerology.out | 55 ++++++++++++++++++++++++
 src/test/regress/sql/numerology.sql      | 14 ++++++
 2 files changed, 69 insertions(+)

diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 44d6c435de..32c6d80c03 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,6 +3,61 @@
 -- Test various combinations of numeric types and functions.
 --
 --
+-- Trailing junk in numeric literals
+--
+SELECT 123abc;
+ abc 
+-----
+ 123
+(1 row)
+
+SELECT 0x0o;
+ x0o 
+-----
+   0
+(1 row)
+
+SELECT 1_2_3;
+ _2_3 
+------
+    1
+(1 row)
+
+SELECT 0.a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT .0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e1a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0e;
+  e  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e+a;
+ERROR:  syntax error at or near "+"
+LINE 1: SELECT 0.0e+a;
+                   ^
+--
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
 --  so let's try explicit conversions for now - tgl 97/05/07
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index fddb58f8fd..70447a95fa 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,6 +3,20 @@
 -- Test various combinations of numeric types and functions.
 --
 
+--
+-- Trailing junk in numeric literals
+--
+
+SELECT 123abc;
+SELECT 0x0o;
+SELECT 1_2_3;
+SELECT 0.a;
+SELECT 0.0a;
+SELECT .0a;
+SELECT 0.0e1a;
+SELECT 0.0e;
+SELECT 0.0e+a;
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

v6-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v6-0005-Reject-trailing-junk-after-numeric-literals.patchDownload

From 8ee2a422ec86a055ccddd26f397760d372aec4a8 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 5/7] Reject trailing junk after numeric literals

After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc.  This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/scan.l                | 27 ++++++----
 src/fe_utils/psqlscan.l                  | 21 +++++---
 src/interfaces/ecpg/preproc/pgc.l        |  4 ++
 src/test/regress/expected/numerology.out | 68 +++++++++---------------
 4 files changed, 61 insertions(+), 59 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 76fd6996ed..f889c2faf7 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -399,6 +399,10 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
 
 other			.
@@ -996,19 +1000,24 @@ other			.
 					return FCONST;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 */
-					yyless(yyleng - 1);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
+				}
+{integer_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{decimal_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{real_junk}		{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
 				}
 
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index db8a8dfaf2..09709e6151 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -337,6 +337,10 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
 
 /* psql-specific: characters allowed in variable names */
@@ -855,17 +859,18 @@ other			.
 					ECHO;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 * (in psql, we don't actually care...)
-					 */
-					yyless(yyleng - 1);
 					ECHO;
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
+					ECHO;
+				}
+{integer_junk}	{
+					ECHO;
+				}
+{decimal_junk}	{
+					ECHO;
+				}
+{real_junk}		{
 					ECHO;
 				}
 
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index a2f8c7f3d8..110478059b 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
 
 /* special characters for other dbms */
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 32c6d80c03..2f176ccb52 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,57 +6,41 @@
 -- Trailing junk in numeric literals
 --
 SELECT 123abc;
- abc 
------
- 123
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+               ^
 SELECT 0x0o;
- x0o 
------
-   0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+               ^
 SELECT 1_2_3;
- _2_3 
-------
-    1
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+               ^
 SELECT 0.a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+               ^
 SELECT 0.0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+               ^
 SELECT .0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+               ^
 SELECT 0.0e1a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+               ^
 SELECT 0.0e;
-  e  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+               ^
 SELECT 0.0e+a;
-ERROR:  syntax error at or near "+"
+ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
-                   ^
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

v6-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v6-0006-Non-decimal-integer-literals.patchDownload

From 8cf484ed47263ecf257e3770715cfa83394f1fa4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 6/7] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  26 ++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/scan.l                  |  99 +++++++++++----
 src/backend/utils/adt/numutils.c           | 140 +++++++++++++++++++++
 src/fe_utils/psqlscan.l                    |  78 +++++++++---
 src/interfaces/ecpg/preproc/pgc.l          |  93 +++++++-------
 src/test/regress/expected/int2.out         |  19 +++
 src/test/regress/expected/int4.out         |  19 +++
 src/test/regress/expected/int8.out         |  19 +++
 src/test/regress/expected/numerology.out   |  59 ++++++++-
 src/test/regress/sql/int2.sql              |   7 ++
 src/test/regress/sql/int4.sql              |   7 ++
 src/test/regress/sql/int8.sql              |   7 ++
 src/test/regress/sql/numerology.sql        |  21 +++-
 15 files changed, 511 insertions(+), 90 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 11d9dd60c2..ce88c483a2 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index b8a78f4d41..545cb45131 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index f889c2faf7..c55338b601 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,25 +385,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
+param			\${decinteger}
 
 other			.
 
@@ -979,20 +994,44 @@ other			.
 					return PARAM;
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail}		{
+					SET_YYLLOC();
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					SET_YYLLOC();
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1007,11 +1046,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1307,17 +1358,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 18de54da40..358cee2ec4 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -120,6 +131,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -128,6 +181,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -196,6 +250,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -204,6 +300,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -280,6 +377,48 @@ pg_strtoint64(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -288,6 +427,7 @@ pg_strtoint64(const char *s)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 09709e6151..af38f173fa 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,25 +323,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
+param			\${decinteger}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -844,13 +859,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -864,10 +897,19 @@ other			.
 {realfail2}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 110478059b..c4805bd91f 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,25 +351,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
+param			\${decinteger}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -399,9 +414,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -414,7 +426,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -929,17 +941,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
+				}
+{hexinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 16);
 				}
-{decimal}		{
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -948,18 +963,25 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
+<SQL>{octinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 8);
+				}
+<SQL>{bininteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 2);
+				}
+
 <SQL>{
 :{identifier}((("->"|\.){identifier})|(\[{array}\]))*	{
 					base_yylval.str = mm_strdup(yytext+1);
@@ -1015,19 +1037,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *)yytext,&endptr,16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1552,17 +1561,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..220e1493e8 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..6fdbd58b40 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,22 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..edd15a4353 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 2f176ccb52..be3868d40f 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,33 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -41,6 +60,42 @@ SELECT 0.0e+a;
 ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
                ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+               ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+               ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+               ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..0dee22fe6d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..2a69b1614e 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,10 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..b7ad696dd8 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index 70447a95fa..fd7e02e536 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,16 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -17,6 +23,19 @@
 SELECT 0.0e;
 SELECT 0.0e+a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

v6-0007-WIP-Underscores-in-numeric-literals.patchtext/plain; charset=UTF-8; name=v6-0007-WIP-Underscores-in-numeric-literals.patchDownload

From fb17e09849c74414947a4107dd13883b7347629c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 7/7] WIP: Underscores in numeric literals

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/Makefile              |  2 +-
 src/backend/parser/scan.l                | 26 +++++++++++++++---
 src/test/regress/expected/numerology.out | 34 +++++++++++++++++++++---
 src/test/regress/sql/numerology.sql      |  7 ++++-
 4 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
index 5ddb9a92f0..827bc4c189 100644
--- a/src/backend/parser/Makefile
+++ b/src/backend/parser/Makefile
@@ -56,7 +56,7 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/s
 
 
 scan.c: FLEXFLAGS = -CF -p -p
-scan.c: FLEX_NO_BACKUP=yes
+#scan.c: FLEX_NO_BACKUP=yes
 scan.c: FLEX_FIX_WARNING=yes
 
 
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index c55338b601..7b6e6e3c9e 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -395,10 +395,10 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
 hexfail			0[xX]
 octfail			0[oO]
@@ -1367,6 +1367,24 @@ process_integer_literal(const char *token, YYSTYPE *lval, int base)
 	int			val;
 	char	   *endptr;
 
+	if (strchr(token, '_'))
+	{
+		char	   *newtoken = palloc(strlen(token));
+		const char *p1;
+		char	   *p2;
+
+		p1 = token;
+		p2 = newtoken;
+		while (*p1)
+		{
+			if (*p1 != '_')
+				*p2++ = *p1;
+			p1++;
+		}
+		*p2 = '\0';
+		token = newtoken;
+	}
+
 	errno = 0;
 	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index be3868d40f..cf5d528558 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -23,6 +23,36 @@ SELECT 0x42F;
      1071
 (1 row)
 
+SELECT 1_000_000;
+ ?column? 
+----------
+  1000000
+(1 row)
+
+SELECT 1_2_3;
+ ?column? 
+----------
+      123
+(1 row)
+
+SELECT 0x1EEE_FFFF;
+ ?column?  
+-----------
+ 518979583
+(1 row)
+
+SELECT 0o2_73;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0b_10_0101;
+ ?column? 
+----------
+       37
+(1 row)
+
 -- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
@@ -32,10 +62,6 @@ SELECT 0x0o;
 ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
-SELECT 1_2_3;
-ERROR:  trailing junk after numeric literal at or near "1_"
-LINE 1: SELECT 1_2_3;
-               ^
 SELECT 0.a;
 ERROR:  trailing junk after numeric literal at or near "0.a"
 LINE 1: SELECT 0.a;
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index fd7e02e536..970654f0b7 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -12,10 +12,15 @@
 SELECT 0o273;
 SELECT 0x42F;
 
+SELECT 1_000_000;
+SELECT 1_2_3;
+SELECT 0x1EEE_FFFF;
+SELECT 0o2_73;
+SELECT 0b_10_0101;
+
 -- error cases
 SELECT 123abc;
 SELECT 0x0o;
-SELECT 1_2_3;
 SELECT 0.a;
 SELECT 0.0a;
 SELECT .0a;
-- 
2.34.1

#17

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#16)

7 attachment(s)

Re: Non-decimal integer literals

Another modest update, because of the copyright year update preventing
the previous patches from applying cleanly.

I also did a bit of work on the ecpg scanner so that it also handles
some errors on par with the main scanner.

There is still no automated testing of this in ecpg, but I have a bunch
of single-line test files that can provoke various errors. I will keep
these around and maybe put them into something more formal in the future.

Show quoted text

On 30.12.21 10:43, Peter Eisentraut wrote:

There has been some other refactoring going on, which made this patch
set out of date. So here is an update.

The old pg_strtouint64() has been removed, so there is no longer a
naming concern with patch 0001. That one should be good to go.

I also found that yet another way to parse integers in pg_atoi() has
mostly faded away in utility, so I removed the last two callers and
removed the function in 0002 and 0003.

The remaining patches are as before, with some of the review comments
applied. I still need to write some lexing unit tests for ecpg, which I
haven't gotten to yet. This affects patches 0004 and 0005.

As mentioned before, patches 0006 and 0007 are more feature previews at
this point.

On 01.12.21 16:47, Peter Eisentraut wrote:
On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now,
if you like.

done
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}
A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.
Yeah, it's a bit weird that not all the symbols are used in ecpg.
I'll look into explaining this better.
0006
+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }
It seems these could use SET_YYLLOC(), since the error cursor doesn't
match other failure states:
ok

We might consider some tests for ECPG since lack of coverage has been
a problem.

right

Also, I'm curious: how does the spec work as far as deciding the year
of release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1. But the preparation work (up to v5-0005, and some other
number parsing refactoring that I'm seeing) could be considered for PG15.

I'll move this to the next CF and come back with an updated patch set
in a little while.

Attachments:

v7-0001-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v7-0001-Move-scanint8-to-numutils.c.patchDownload

From e7aad2b81e9be2b53dad73c66e692a80fc2f81e1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 1/7] Move scanint8() to numutils.c

Move scanint8() to numutils.c and rename to pg_strtoint64().  We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent.  The API is also changed to no longer provide the errorOK
case.  Users that need the error checking can use strtoi64().

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/parse_node.c             | 12 ++-
 src/backend/replication/pgoutput/pgoutput.c |  9 ++-
 src/backend/utils/adt/int8.c                | 90 +--------------------
 src/backend/utils/adt/numutils.c            | 84 +++++++++++++++++++
 src/bin/pgbench/pgbench.c                   |  4 +-
 src/include/utils/builtins.h                |  1 +
 src/include/utils/int8.h                    | 25 ------
 7 files changed, 103 insertions(+), 122 deletions(-)
 delete mode 100644 src/include/utils/int8.h

diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index ba9baf140c..8dd821b761 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -26,7 +26,6 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
@@ -353,7 +352,6 @@ make_const(ParseState *pstate, A_Const *aconst)
 {
 	Const	   *con;
 	Datum		val;
-	int64		val64;
 	Oid			typeid;
 	int			typelen;
 	bool		typebyval;
@@ -384,8 +382,15 @@ make_const(ParseState *pstate, A_Const *aconst)
 			break;
 
 		case T_Float:
+		{
 			/* could be an oversize integer as well as a float ... */
-			if (scanint8(aconst->val.fval.val, true, &val64))
+
+			int64		val64;
+			char	   *endptr;
+
+			errno = 0;
+			val64 = strtoi64(aconst->val.fval.val, &endptr, 10);
+			if (errno == 0 && *endptr == '\0')
 			{
 				/*
 				 * It might actually fit in int32. Probably only INT_MIN can
@@ -425,6 +430,7 @@ make_const(ParseState *pstate, A_Const *aconst)
 				typebyval = false;
 			}
 			break;
+		}
 
 		case T_String:
 
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index af8d51aee9..0570caa351 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -21,7 +21,6 @@
 #include "replication/logicalproto.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
-#include "utils/int8.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -205,7 +204,8 @@ parse_output_parameters(List *options, PGOutputData *data)
 		/* Check each param, whether or not we recognize it */
 		if (strcmp(defel->defname, "proto_version") == 0)
 		{
-			int64		parsed;
+			unsigned long parsed;
+			char	   *endptr;
 
 			if (protocol_version_given)
 				ereport(ERROR,
@@ -213,12 +213,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 						 errmsg("conflicting or redundant options")));
 			protocol_version_given = true;
 
-			if (!scanint8(strVal(defel->arg), true, &parsed))
+			parsed = strtoul(strVal(defel->arg), &endptr, 10);
+			if (errno || *endptr != '\0')
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("invalid proto_version")));
 
-			if (parsed > PG_UINT32_MAX || parsed < 0)
+			if (parsed > PG_UINT32_MAX)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("proto_version \"%s\" out of range",
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index ad19d154ff..4a87114a4f 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -24,7 +24,6 @@
 #include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 
 
 typedef struct
@@ -45,99 +44,14 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
-/*
- * scanint8 --- try to parse a string into an int8.
- *
- * If errorOK is false, ereport a useful error message if the string is bad.
- * If errorOK is true, just return "false" for bad input.
- */
-bool
-scanint8(const char *str, bool errorOK, int64 *result)
-{
-	const char *ptr = str;
-	int64		tmp = 0;
-	bool		neg = false;
-
-	/*
-	 * Do our own scan, rather than relying on sscanf which might be broken
-	 * for long long.
-	 *
-	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
-	 * value as a negative number.
-	 */
-
-	/* skip leading spaces */
-	while (*ptr && isspace((unsigned char) *ptr))
-		ptr++;
-
-	/* handle sign */
-	if (*ptr == '-')
-	{
-		ptr++;
-		neg = true;
-	}
-	else if (*ptr == '+')
-		ptr++;
-
-	/* require at least one digit */
-	if (unlikely(!isdigit((unsigned char) *ptr)))
-		goto invalid_syntax;
-
-	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
-	{
-		int8		digit = (*ptr++ - '0');
-
-		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
-	}
-
-	/* allow trailing whitespace, but not other trailing chars */
-	while (*ptr != '\0' && isspace((unsigned char) *ptr))
-		ptr++;
-
-	if (unlikely(*ptr != '\0'))
-		goto invalid_syntax;
-
-	if (!neg)
-	{
-		/* could fail if input is most negative number */
-		if (unlikely(tmp == PG_INT64_MIN))
-			goto out_of_range;
-		tmp = -tmp;
-	}
-
-	*result = tmp;
-	return true;
-
-out_of_range:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("value \"%s\" is out of range for type %s",
-						str, "bigint")));
-	return false;
-
-invalid_syntax:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"bigint", str)));
-	return false;
-}
-
 /* int8in()
  */
 Datum
 int8in(PG_FUNCTION_ARGS)
 {
-	char	   *str = PG_GETARG_CSTRING(0);
-	int64		result;
+	char	   *num = PG_GETARG_CSTRING(0);
 
-	(void) scanint8(str, false, &result);
-	PG_RETURN_INT64(result);
+	PG_RETURN_INT64(pg_strtoint64(num));
 }
 
 
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 898a9e3f9a..e82d23a325 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -325,6 +325,90 @@ pg_strtoint32(const char *s)
 	return 0;					/* keep compiler quiet */
 }
 
+/*
+ * Convert input string to a signed 64 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. Will throw
+ * ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int64
+pg_strtoint64(const char *s)
+{
+	const char *ptr = s;
+	int64		tmp = 0;
+	bool		neg = false;
+
+	/*
+	 * Do our own scan, rather than relying on sscanf which might be broken
+	 * for long long.
+	 *
+	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
+	 * value as a negative number.
+	 */
+
+	/* skip leading spaces */
+	while (*ptr && isspace((unsigned char) *ptr))
+		ptr++;
+
+	/* handle sign */
+	if (*ptr == '-')
+	{
+		ptr++;
+		neg = true;
+	}
+	else if (*ptr == '+')
+		ptr++;
+
+	/* require at least one digit */
+	if (unlikely(!isdigit((unsigned char) *ptr)))
+		goto invalid_syntax;
+
+	/* process digits */
+	while (*ptr && isdigit((unsigned char) *ptr))
+	{
+		int8		digit = (*ptr++ - '0');
+
+		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
+			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+			goto out_of_range;
+	}
+
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*ptr != '\0' && isspace((unsigned char) *ptr))
+		ptr++;
+
+	if (unlikely(*ptr != '\0'))
+		goto invalid_syntax;
+
+	if (!neg)
+	{
+		/* could fail if input is most negative number */
+		if (unlikely(tmp == PG_INT64_MIN))
+			goto out_of_range;
+		tmp = -tmp;
+	}
+
+	return tmp;
+
+out_of_range:
+	ereport(ERROR,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value \"%s\" is out of range for type %s",
+					s, "bigint")));
+
+invalid_syntax:
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"bigint", s)));
+
+	return 0;					/* keep compiler quiet */
+}
+
 /*
  * pg_itoa: converts a signed 16-bit integer to its string representation
  * and returns strlen(a).
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 97f2a1f80a..f166a77e3a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -787,8 +787,8 @@ is_an_int(const char *str)
 /*
  * strtoint64 -- convert a string to 64-bit integer
  *
- * This function is a slightly modified version of scanint8() from
- * src/backend/utils/adt/int8.c.
+ * This function is a slightly modified version of pg_strtoint64() from
+ * src/backend/utils/adt/numutils.c.
  *
  * The function returns whether the conversion worked, and if so
  * "*result" is set to the result.
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ac4780e3f..191cc854a3 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -46,6 +46,7 @@ extern int	namestrcmp(Name name, const char *str);
 extern int32 pg_atoi(const char *s, int size, int c);
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
+extern int64 pg_strtoint64(const char *s);
 extern int	pg_itoa(int16 i, char *a);
 extern int	pg_ultoa_n(uint32 l, char *a);
 extern int	pg_ulltoa_n(uint64 l, char *a);
diff --git a/src/include/utils/int8.h b/src/include/utils/int8.h
deleted file mode 100644
index f0386c4008..0000000000
--- a/src/include/utils/int8.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * int8.h
- *	  Declarations for operations on 64-bit integers.
- *
- *
- * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/include/utils/int8.h
- *
- * NOTES
- * These data types are supported on all 64-bit architectures, and may
- *	be supported through libraries on some 32-bit machines. If your machine
- *	is not currently supported, then please try to make it so, then post
- *	patches to the postgresql.org hackers mailing list.
- *
- *-------------------------------------------------------------------------
- */
-#ifndef INT8_H
-#define INT8_H
-
-extern bool scanint8(const char *str, bool errorOK, int64 *result);
-
-#endif							/* INT8_H */

base-commit: bed6ed3de9b3e62d8c6ee034513d04d769091927
-- 
2.34.1

v7-0002-Remove-one-use-of-pg_atoi.patchtext/plain; charset=UTF-8; name=v7-0002-Remove-one-use-of-pg_atoi.patchDownload

From 15bc1f99665a2c52adb2282a4e65d0a628ecaf9b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 2/7] Remove one use of pg_atoi()

There was no real need to use this here instead of a simpler API.
---
 src/backend/utils/adt/jsonpath_gram.y | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 7a251b892d..7311d12e35 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -232,7 +232,7 @@ array_accessor:
 	;
 
 any_level:
-	INT_P							{ $$ = pg_atoi($1.val, 4, 0); }
+	INT_P							{ $$ = pg_strtoint32($1.val); }
 	| LAST_P						{ $$ = -1; }
 	;
 
-- 
2.34.1

v7-0003-Remove-pg_atoi.patchtext/plain; charset=UTF-8; name=v7-0003-Remove-pg_atoi.patchDownload

From dcbc44a62d06d660314305dff4919041b7408f63 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 3/7] Remove pg_atoi()

The last caller was int2vectorin(), and having such a general function
for one user didn't seem useful, so just put the required parts inline
and remove the function.
---
 src/backend/utils/adt/int.c      | 32 ++++++++++--
 src/backend/utils/adt/numutils.c | 88 --------------------------------
 src/include/utils/builtins.h     |  1 -
 3 files changed, 28 insertions(+), 93 deletions(-)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 8bd234c11c..42ddae99ef 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -146,15 +146,39 @@ int2vectorin(PG_FUNCTION_ARGS)
 
 	result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS));
 
-	for (n = 0; *intString && n < FUNC_MAX_ARGS; n++)
+	for (n = 0; n < FUNC_MAX_ARGS; n++)
 	{
+		long		l;
+		char	   *endp;
+
 		while (*intString && isspace((unsigned char) *intString))
 			intString++;
 		if (*intString == '\0')
 			break;
-		result->values[n] = pg_atoi(intString, sizeof(int16), ' ');
-		while (*intString && !isspace((unsigned char) *intString))
-			intString++;
+
+		errno = 0;
+		l = strtol(intString, &endp, 10);
+
+		if (intString == endp)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s: \"%s\"",
+							"smallint", intString)));
+
+		if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
+			ereport(ERROR,
+					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					 errmsg("value \"%s\" is out of range for type %s", intString,
+							"smallint")));
+
+		if (*endp && *endp != ' ')
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s: \"%s\"",
+							"integer", intString)));
+
+		result->values[n] = l;
+		intString = endp;
 	}
 	while (*intString && isspace((unsigned char) *intString))
 		intString++;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index e82d23a325..cc3f95d399 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,94 +85,6 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
-/*
- * pg_atoi: convert string to integer
- *
- * allows any number of leading or trailing whitespace characters.
- *
- * 'size' is the sizeof() the desired integral result (1, 2, or 4 bytes).
- *
- * c, if not 0, is a terminator character that may appear after the
- * integer (plus whitespace).  If 0, the string must end after the integer.
- *
- * Unlike plain atoi(), this will throw ereport() upon bad input format or
- * overflow.
- */
-int32
-pg_atoi(const char *s, int size, int c)
-{
-	long		l;
-	char	   *badp;
-
-	/*
-	 * Some versions of strtol treat the empty string as an error, but some
-	 * seem not to.  Make an explicit test to be sure we catch it.
-	 */
-	if (s == NULL)
-		elog(ERROR, "NULL pointer");
-	if (*s == 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	errno = 0;
-	l = strtol(s, &badp, 10);
-
-	/* We made no progress parsing the string, so bail out */
-	if (s == badp)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	switch (size)
-	{
-		case sizeof(int32):
-			if (errno == ERANGE
-#if defined(HAVE_LONG_INT_64)
-			/* won't get ERANGE on these with 64-bit longs... */
-				|| l < INT_MIN || l > INT_MAX
-#endif
-				)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for type %s", s,
-								"integer")));
-			break;
-		case sizeof(int16):
-			if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for type %s", s,
-								"smallint")));
-			break;
-		case sizeof(int8):
-			if (errno == ERANGE || l < SCHAR_MIN || l > SCHAR_MAX)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for 8-bit integer", s)));
-			break;
-		default:
-			elog(ERROR, "unsupported result size: %d", size);
-	}
-
-	/*
-	 * Skip any trailing whitespace; if anything but whitespace remains before
-	 * the terminating character, bail out
-	 */
-	while (*badp && *badp != c && isspace((unsigned char) *badp))
-		badp++;
-
-	if (*badp && *badp != c)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	return (int32) l;
-}
-
 /*
  * Convert input string to a signed 16 bit integer.
  *
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 191cc854a3..58abf4364a 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -43,7 +43,6 @@ extern void namestrcpy(Name name, const char *str);
 extern int	namestrcmp(Name name, const char *str);
 
 /* numutils.c */
-extern int32 pg_atoi(const char *s, int size, int c);
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
 extern int64 pg_strtoint64(const char *s);
-- 
2.34.1

v7-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v7-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload

From fb224fec2251b61cc5cf57806b6741db8f8cc58c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 4/7] Add test case for trailing junk after numeric literals

PostgreSQL currently accepts numeric literals with trailing
non-digits, such as 123abc where the abc is treated as the next token.
This may be a bit surprising.  This commit adds test cases for this;
subsequent commits intend to change this behavior.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/test/regress/expected/numerology.out | 62 ++++++++++++++++++++++++
 src/test/regress/sql/numerology.sql      | 16 ++++++
 2 files changed, 78 insertions(+)

diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 44d6c435de..2ffc73e854 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -2,6 +2,68 @@
 -- NUMEROLOGY
 -- Test various combinations of numeric types and functions.
 --
+--
+-- Trailing junk in numeric literals
+--
+SELECT 123abc;
+ abc 
+-----
+ 123
+(1 row)
+
+SELECT 0x0o;
+ x0o 
+-----
+   0
+(1 row)
+
+SELECT 1_2_3;
+ _2_3 
+------
+    1
+(1 row)
+
+SELECT 0.a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT .0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e1a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0e;
+  e  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e+a;
+ERROR:  syntax error at or near "+"
+LINE 1: SELECT 0.0e+a;
+                   ^
+PREPARE p1 AS SELECT $1a;
+EXECUTE p1(1);
+ a 
+---
+ 1
+(1 row)
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index fddb58f8fd..fb75f97832 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,6 +3,22 @@
 -- Test various combinations of numeric types and functions.
 --
 
+--
+-- Trailing junk in numeric literals
+--
+
+SELECT 123abc;
+SELECT 0x0o;
+SELECT 1_2_3;
+SELECT 0.a;
+SELECT 0.0a;
+SELECT .0a;
+SELECT 0.0e1a;
+SELECT 0.0e;
+SELECT 0.0e+a;
+PREPARE p1 AS SELECT $1a;
+EXECUTE p1(1);
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

v7-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v7-0005-Reject-trailing-junk-after-numeric-literals.patchDownload

From ac3b6ac952624ded1c9aefe4f3e8a6715f4bb1d9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 5/7] Reject trailing junk after numeric literals

After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc.  This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/scan.l                | 32 +++++++---
 src/fe_utils/psqlscan.l                  | 25 +++++---
 src/interfaces/ecpg/preproc/pgc.l        | 22 +++++++
 src/test/regress/expected/numerology.out | 77 +++++++++---------------
 src/test/regress/sql/numerology.sql      |  1 -
 5 files changed, 91 insertions(+), 66 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index f555ac6e6d..ab24bf70db 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -399,7 +399,12 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
+param_junk		\${integer}{ident_start}
 
 other			.
 
@@ -974,6 +979,10 @@ other			.
 					yylval->ival = atol(yytext + 1);
 					return PARAM;
 				}
+{param_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after parameter");
+				}
 
 {integer}		{
 					SET_YYLLOC();
@@ -996,19 +1005,24 @@ other			.
 					return FCONST;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 */
-					yyless(yyleng - 1);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
+				}
+{integer_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{decimal_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{real_junk}		{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
 				}
 
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 941ed06553..0394edb15f 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -337,7 +337,12 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
+param_junk		\${integer}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -839,6 +844,9 @@ other			.
 {param}			{
 					ECHO;
 				}
+{param_junk}	{
+					ECHO;
+				}
 
 {integer}		{
 					ECHO;
@@ -855,17 +863,18 @@ other			.
 					ECHO;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 * (in psql, we don't actually care...)
-					 */
-					yyless(yyleng - 1);
 					ECHO;
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
+					ECHO;
+				}
+{integer_junk}	{
+					ECHO;
+				}
+{decimal_junk}	{
+					ECHO;
+				}
+{real_junk}		{
 					ECHO;
 				}
 
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 39e578e868..25fb3b43b3 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,7 +365,12 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
+param_junk		\${integer}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -917,6 +922,9 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 					base_yylval.ival = atol(yytext+1);
 					return PARAM;
 				}
+{param_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after parameter");
+				}
 
 {ip}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -957,6 +965,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 } /* <C,SQL> */
 
 <SQL>{
+/*
+ * Note that some trailing junk is valid in C (such as 100LL), so we contain
+ * this to SQL mode.
+ */
+{integer_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{decimal_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{real_junk}		{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+
 :{identifier}((("->"|\.){identifier})|(\[{array}\]))*	{
 					base_yylval.str = mm_strdup(yytext+1);
 					return CVARIABLE;
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 2ffc73e854..77d4843417 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,64 +6,45 @@
 -- Trailing junk in numeric literals
 --
 SELECT 123abc;
- abc 
------
- 123
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+               ^
 SELECT 0x0o;
- x0o 
------
-   0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+               ^
 SELECT 1_2_3;
- _2_3 
-------
-    1
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+               ^
 SELECT 0.a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+               ^
 SELECT 0.0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+               ^
 SELECT .0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+               ^
 SELECT 0.0e1a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+               ^
 SELECT 0.0e;
-  e  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+               ^
 SELECT 0.0e+a;
-ERROR:  syntax error at or near "+"
+ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
-                   ^
+               ^
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
- a 
----
- 1
-(1 row)
-
+ERROR:  trailing junk after parameter at or near "$1a"
+LINE 1: PREPARE p1 AS SELECT $1a;
+                             ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index fb75f97832..be7d6dfe0c 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -17,7 +17,6 @@
 SELECT 0.0e;
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
 
 --
 -- Test implicit type conversions
-- 
2.34.1

v7-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v7-0006-Non-decimal-integer-literals.patchDownload

From d40d84e76525f732ee8a07ffd62c68db5368c842 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 6/7] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  26 ++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/scan.l                  | 101 +++++++++++----
 src/backend/utils/adt/numutils.c           | 140 +++++++++++++++++++++
 src/fe_utils/psqlscan.l                    |  80 +++++++++---
 src/interfaces/ecpg/preproc/pgc.l          | 116 +++++++++--------
 src/test/regress/expected/int2.out         |  19 +++
 src/test/regress/expected/int4.out         |  19 +++
 src/test/regress/expected/int8.out         |  19 +++
 src/test/regress/expected/numerology.out   |  59 ++++++++-
 src/test/regress/sql/int2.sql              |   7 ++
 src/test/regress/sql/int4.sql              |   7 ++
 src/test/regress/sql/int8.sql              |   7 ++
 src/test/regress/sql/numerology.sql        |  21 +++-
 15 files changed, 529 insertions(+), 99 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index b4f348a24d..1957fc6e2d 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index b8a78f4d41..545cb45131 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index ab24bf70db..2e1aa62d81 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,26 +385,41 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 other			.
 
@@ -984,20 +999,44 @@ other			.
 					yyerror("trailing junk after parameter");
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail}		{
+					SET_YYLLOC();
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					SET_YYLLOC();
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1012,11 +1051,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1312,17 +1363,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index cc3f95d399..37364921d5 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -120,6 +131,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -128,6 +181,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -196,6 +250,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -204,6 +300,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -280,6 +377,48 @@ pg_strtoint64(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -288,6 +427,7 @@ pg_strtoint64(const char *s)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0394edb15f..09155a3d5d 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,26 +323,41 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -848,13 +863,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -868,10 +901,19 @@ other			.
 {realfail2}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 25fb3b43b3..58d1a00d65 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,26 +351,41 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -400,9 +415,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -415,7 +427,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -933,17 +945,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
-{decimal}		{
+{hexinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 16);
+				}
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -952,27 +967,43 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
 <SQL>{
-/*
- * Note that some trailing junk is valid in C (such as 100LL), so we contain
- * this to SQL mode.
- */
-{integer_junk}	{
+{octinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 8);
+				}
+{bininteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 2);
+				}
+
+	/*
+	 * Note that some trailing junk is valid in C (such as 100LL), so we contain
+	 * this to SQL mode.
+	 */
+{decinteger_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{hexinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{numeric_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
 {real_junk}		{
@@ -1033,19 +1064,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *)yytext,&endptr,16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1570,17 +1588,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..220e1493e8 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..6fdbd58b40 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,22 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..edd15a4353 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 77d4843417..d95b24c7b3 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,33 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -45,6 +64,42 @@ PREPARE p1 AS SELECT $1a;
 ERROR:  trailing junk after parameter at or near "$1a"
 LINE 1: PREPARE p1 AS SELECT $1a;
                              ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+               ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+               ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+               ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..0dee22fe6d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..2a69b1614e 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,10 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..b7ad696dd8 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index be7d6dfe0c..0e12bcc7b7 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,16 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -18,6 +24,19 @@
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

v7-0007-WIP-Underscores-in-numeric-literals.patchtext/plain; charset=UTF-8; name=v7-0007-WIP-Underscores-in-numeric-literals.patchDownload

From ac104eaa206f6b98631a2ef18bfdb0afb494bb9c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 7/7] WIP: Underscores in numeric literals

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/Makefile              |  2 +-
 src/backend/parser/scan.l                | 26 +++++++++++++++---
 src/test/regress/expected/numerology.out | 34 +++++++++++++++++++++---
 src/test/regress/sql/numerology.sql      |  7 ++++-
 4 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
index 5ddb9a92f0..827bc4c189 100644
--- a/src/backend/parser/Makefile
+++ b/src/backend/parser/Makefile
@@ -56,7 +56,7 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/s
 
 
 scan.c: FLEXFLAGS = -CF -p -p
-scan.c: FLEX_NO_BACKUP=yes
+#scan.c: FLEX_NO_BACKUP=yes
 scan.c: FLEX_FIX_WARNING=yes
 
 
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 2e1aa62d81..5b574c4233 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -395,10 +395,10 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
 hexfail			0[xX]
 octfail			0[oO]
@@ -1372,6 +1372,24 @@ process_integer_literal(const char *token, YYSTYPE *lval, int base)
 	int			val;
 	char	   *endptr;
 
+	if (strchr(token, '_'))
+	{
+		char	   *newtoken = palloc(strlen(token));
+		const char *p1;
+		char	   *p2;
+
+		p1 = token;
+		p2 = newtoken;
+		while (*p1)
+		{
+			if (*p1 != '_')
+				*p2++ = *p1;
+			p1++;
+		}
+		*p2 = '\0';
+		token = newtoken;
+	}
+
 	errno = 0;
 	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index d95b24c7b3..7289a325fc 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -23,6 +23,36 @@ SELECT 0x42F;
      1071
 (1 row)
 
+SELECT 1_000_000;
+ ?column? 
+----------
+  1000000
+(1 row)
+
+SELECT 1_2_3;
+ ?column? 
+----------
+      123
+(1 row)
+
+SELECT 0x1EEE_FFFF;
+ ?column?  
+-----------
+ 518979583
+(1 row)
+
+SELECT 0o2_73;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0b_10_0101;
+ ?column? 
+----------
+       37
+(1 row)
+
 -- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
@@ -32,10 +62,6 @@ SELECT 0x0o;
 ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
-SELECT 1_2_3;
-ERROR:  trailing junk after numeric literal at or near "1_"
-LINE 1: SELECT 1_2_3;
-               ^
 SELECT 0.a;
 ERROR:  trailing junk after numeric literal at or near "0.a"
 LINE 1: SELECT 0.a;
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index 0e12bcc7b7..f35ff31d9a 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -12,10 +12,15 @@
 SELECT 0o273;
 SELECT 0x42F;
 
+SELECT 1_000_000;
+SELECT 1_2_3;
+SELECT 0x1EEE_FFFF;
+SELECT 0o2_73;
+SELECT 0b_10_0101;
+
 -- error cases
 SELECT 123abc;
 SELECT 0x0o;
-SELECT 1_2_3;
 SELECT 0.a;
 SELECT 0.0a;
 SELECT .0a;
-- 
2.34.1

#18

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#17)

7 attachment(s)

Re: Non-decimal integer literals

Rebased patch set

Show quoted text

On 13.01.22 14:42, Peter Eisentraut wrote:

Another modest update, because of the copyright year update preventing
the previous patches from applying cleanly.

I also did a bit of work on the ecpg scanner so that it also handles
some errors on par with the main scanner.

There is still no automated testing of this in ecpg, but I have a bunch
of single-line test files that can provoke various errors. I will keep
these around and maybe put them into something more formal in the future.

On 30.12.21 10:43, Peter Eisentraut wrote:
There has been some other refactoring going on, which made this patch
set out of date. So here is an update.

The old pg_strtouint64() has been removed, so there is no longer a
naming concern with patch 0001. That one should be good to go.

I also found that yet another way to parse integers in pg_atoi() has
mostly faded away in utility, so I removed the last two callers and
removed the function in 0002 and 0003.

The remaining patches are as before, with some of the review comments
applied. I still need to write some lexing unit tests for ecpg, which
I haven't gotten to yet. This affects patches 0004 and 0005.

As mentioned before, patches 0006 and 0007 are more feature previews
at this point.

On 01.12.21 16:47, Peter Eisentraut wrote:
On 25.11.21 18:51, John Naylor wrote:

If we're going to change the comment anyway, "the parser" sounds
more natural. Aside from that, 0001 and 0002 can probably be pushed
now, if you like.

done
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}
A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.
Yeah, it's a bit weird that not all the symbols are used in ecpg.
I'll look into explaining this better.
0006
+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }
It seems these could use SET_YYLLOC(), since the error cursor
doesn't match other failure states:
ok

We might consider some tests for ECPG since lack of coverage has
been a problem.

right

Also, I'm curious: how does the spec work as far as deciding the
year of release, or feature-freezing of new items?

The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.

So the feature patches in this thread are in my mind now targeting
PG15+1. But the preparation work (up to v5-0005, and some other
number parsing refactoring that I'm seeing) could be considered for
PG15.

I'll move this to the next CF and come back with an updated patch set
in a little while.

Attachments:

v8-0001-Move-scanint8-to-numutils.c.patchtext/plain; charset=UTF-8; name=v8-0001-Move-scanint8-to-numutils.c.patchDownload

From 4647813fab7f252994f23c865979df30cce6a7c8 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 1/7] Move scanint8() to numutils.c

Move scanint8() to numutils.c and rename to pg_strtoint64().  We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent.  The API is also changed to no longer provide the errorOK
case.  Users that need the error checking can use strtoi64().

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/parse_node.c             | 12 ++-
 src/backend/replication/pgoutput/pgoutput.c |  9 ++-
 src/backend/utils/adt/int8.c                | 90 +--------------------
 src/backend/utils/adt/numutils.c            | 84 +++++++++++++++++++
 src/bin/pgbench/pgbench.c                   |  4 +-
 src/include/utils/builtins.h                |  1 +
 src/include/utils/int8.h                    | 25 ------
 7 files changed, 103 insertions(+), 122 deletions(-)
 delete mode 100644 src/include/utils/int8.h

diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 35db6b6c98..a49c985d36 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -26,7 +26,6 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
@@ -353,7 +352,6 @@ make_const(ParseState *pstate, A_Const *aconst)
 {
 	Const	   *con;
 	Datum		val;
-	int64		val64;
 	Oid			typeid;
 	int			typelen;
 	bool		typebyval;
@@ -384,8 +382,15 @@ make_const(ParseState *pstate, A_Const *aconst)
 			break;
 
 		case T_Float:
+		{
 			/* could be an oversize integer as well as a float ... */
-			if (scanint8(aconst->val.fval.fval, true, &val64))
+
+			int64		val64;
+			char	   *endptr;
+
+			errno = 0;
+			val64 = strtoi64(aconst->val.fval.fval, &endptr, 10);
+			if (errno == 0 && *endptr == '\0')
 			{
 				/*
 				 * It might actually fit in int32. Probably only INT_MIN can
@@ -425,6 +430,7 @@ make_const(ParseState *pstate, A_Const *aconst)
 				typebyval = false;
 			}
 			break;
+		}
 
 		case T_Boolean:
 			val = BoolGetDatum(boolVal(&aconst->val));
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index af8d51aee9..0570caa351 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -21,7 +21,6 @@
 #include "replication/logicalproto.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
-#include "utils/int8.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -205,7 +204,8 @@ parse_output_parameters(List *options, PGOutputData *data)
 		/* Check each param, whether or not we recognize it */
 		if (strcmp(defel->defname, "proto_version") == 0)
 		{
-			int64		parsed;
+			unsigned long parsed;
+			char	   *endptr;
 
 			if (protocol_version_given)
 				ereport(ERROR,
@@ -213,12 +213,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 						 errmsg("conflicting or redundant options")));
 			protocol_version_given = true;
 
-			if (!scanint8(strVal(defel->arg), true, &parsed))
+			parsed = strtoul(strVal(defel->arg), &endptr, 10);
+			if (errno || *endptr != '\0')
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("invalid proto_version")));
 
-			if (parsed > PG_UINT32_MAX || parsed < 0)
+			if (parsed > PG_UINT32_MAX)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("proto_version \"%s\" out of range",
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index ad19d154ff..4a87114a4f 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -24,7 +24,6 @@
 #include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 
 
 typedef struct
@@ -45,99 +44,14 @@ typedef struct
  * Formatting and conversion routines.
  *---------------------------------------------------------*/
 
-/*
- * scanint8 --- try to parse a string into an int8.
- *
- * If errorOK is false, ereport a useful error message if the string is bad.
- * If errorOK is true, just return "false" for bad input.
- */
-bool
-scanint8(const char *str, bool errorOK, int64 *result)
-{
-	const char *ptr = str;
-	int64		tmp = 0;
-	bool		neg = false;
-
-	/*
-	 * Do our own scan, rather than relying on sscanf which might be broken
-	 * for long long.
-	 *
-	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
-	 * value as a negative number.
-	 */
-
-	/* skip leading spaces */
-	while (*ptr && isspace((unsigned char) *ptr))
-		ptr++;
-
-	/* handle sign */
-	if (*ptr == '-')
-	{
-		ptr++;
-		neg = true;
-	}
-	else if (*ptr == '+')
-		ptr++;
-
-	/* require at least one digit */
-	if (unlikely(!isdigit((unsigned char) *ptr)))
-		goto invalid_syntax;
-
-	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
-	{
-		int8		digit = (*ptr++ - '0');
-
-		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
-	}
-
-	/* allow trailing whitespace, but not other trailing chars */
-	while (*ptr != '\0' && isspace((unsigned char) *ptr))
-		ptr++;
-
-	if (unlikely(*ptr != '\0'))
-		goto invalid_syntax;
-
-	if (!neg)
-	{
-		/* could fail if input is most negative number */
-		if (unlikely(tmp == PG_INT64_MIN))
-			goto out_of_range;
-		tmp = -tmp;
-	}
-
-	*result = tmp;
-	return true;
-
-out_of_range:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-				 errmsg("value \"%s\" is out of range for type %s",
-						str, "bigint")));
-	return false;
-
-invalid_syntax:
-	if (!errorOK)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"bigint", str)));
-	return false;
-}
-
 /* int8in()
  */
 Datum
 int8in(PG_FUNCTION_ARGS)
 {
-	char	   *str = PG_GETARG_CSTRING(0);
-	int64		result;
+	char	   *num = PG_GETARG_CSTRING(0);
 
-	(void) scanint8(str, false, &result);
-	PG_RETURN_INT64(result);
+	PG_RETURN_INT64(pg_strtoint64(num));
 }
 
 
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 898a9e3f9a..e82d23a325 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -325,6 +325,90 @@ pg_strtoint32(const char *s)
 	return 0;					/* keep compiler quiet */
 }
 
+/*
+ * Convert input string to a signed 64 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. Will throw
+ * ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int64
+pg_strtoint64(const char *s)
+{
+	const char *ptr = s;
+	int64		tmp = 0;
+	bool		neg = false;
+
+	/*
+	 * Do our own scan, rather than relying on sscanf which might be broken
+	 * for long long.
+	 *
+	 * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
+	 * value as a negative number.
+	 */
+
+	/* skip leading spaces */
+	while (*ptr && isspace((unsigned char) *ptr))
+		ptr++;
+
+	/* handle sign */
+	if (*ptr == '-')
+	{
+		ptr++;
+		neg = true;
+	}
+	else if (*ptr == '+')
+		ptr++;
+
+	/* require at least one digit */
+	if (unlikely(!isdigit((unsigned char) *ptr)))
+		goto invalid_syntax;
+
+	/* process digits */
+	while (*ptr && isdigit((unsigned char) *ptr))
+	{
+		int8		digit = (*ptr++ - '0');
+
+		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
+			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+			goto out_of_range;
+	}
+
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*ptr != '\0' && isspace((unsigned char) *ptr))
+		ptr++;
+
+	if (unlikely(*ptr != '\0'))
+		goto invalid_syntax;
+
+	if (!neg)
+	{
+		/* could fail if input is most negative number */
+		if (unlikely(tmp == PG_INT64_MIN))
+			goto out_of_range;
+		tmp = -tmp;
+	}
+
+	return tmp;
+
+out_of_range:
+	ereport(ERROR,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value \"%s\" is out of range for type %s",
+					s, "bigint")));
+
+invalid_syntax:
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"bigint", s)));
+
+	return 0;					/* keep compiler quiet */
+}
+
 /*
  * pg_itoa: converts a signed 16-bit integer to its string representation
  * and returns strlen(a).
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 97f2a1f80a..f166a77e3a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -787,8 +787,8 @@ is_an_int(const char *str)
 /*
  * strtoint64 -- convert a string to 64-bit integer
  *
- * This function is a slightly modified version of scanint8() from
- * src/backend/utils/adt/int8.c.
+ * This function is a slightly modified version of pg_strtoint64() from
+ * src/backend/utils/adt/numutils.c.
  *
  * The function returns whether the conversion worked, and if so
  * "*result" is set to the result.
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ac4780e3f..191cc854a3 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -46,6 +46,7 @@ extern int	namestrcmp(Name name, const char *str);
 extern int32 pg_atoi(const char *s, int size, int c);
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
+extern int64 pg_strtoint64(const char *s);
 extern int	pg_itoa(int16 i, char *a);
 extern int	pg_ultoa_n(uint32 l, char *a);
 extern int	pg_ulltoa_n(uint64 l, char *a);
diff --git a/src/include/utils/int8.h b/src/include/utils/int8.h
deleted file mode 100644
index f0386c4008..0000000000
--- a/src/include/utils/int8.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * int8.h
- *	  Declarations for operations on 64-bit integers.
- *
- *
- * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/include/utils/int8.h
- *
- * NOTES
- * These data types are supported on all 64-bit architectures, and may
- *	be supported through libraries on some 32-bit machines. If your machine
- *	is not currently supported, then please try to make it so, then post
- *	patches to the postgresql.org hackers mailing list.
- *
- *-------------------------------------------------------------------------
- */
-#ifndef INT8_H
-#define INT8_H
-
-extern bool scanint8(const char *str, bool errorOK, int64 *result);
-
-#endif							/* INT8_H */

base-commit: f032f63e727c1ab07603b3d1cd88d50f850d5738
-- 
2.34.1

v8-0002-Remove-one-use-of-pg_atoi.patchtext/plain; charset=UTF-8; name=v8-0002-Remove-one-use-of-pg_atoi.patchDownload

From abe95a3c1e3dcfa5a1aa3abc9e75e4bc8042a330 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 2/7] Remove one use of pg_atoi()

There was no real need to use this here instead of a simpler API.
---
 src/backend/utils/adt/jsonpath_gram.y | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 7a251b892d..7311d12e35 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -232,7 +232,7 @@ array_accessor:
 	;
 
 any_level:
-	INT_P							{ $$ = pg_atoi($1.val, 4, 0); }
+	INT_P							{ $$ = pg_strtoint32($1.val); }
 	| LAST_P						{ $$ = -1; }
 	;
 
-- 
2.34.1

v8-0003-Remove-pg_atoi.patchtext/plain; charset=UTF-8; name=v8-0003-Remove-pg_atoi.patchDownload

From e74674404800cd04f2b179004ebab00434607961 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 3/7] Remove pg_atoi()

The last caller was int2vectorin(), and having such a general function
for one user didn't seem useful, so just put the required parts inline
and remove the function.
---
 src/backend/utils/adt/int.c      | 32 ++++++++++--
 src/backend/utils/adt/numutils.c | 88 --------------------------------
 src/include/utils/builtins.h     |  1 -
 3 files changed, 28 insertions(+), 93 deletions(-)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 8bd234c11c..42ddae99ef 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -146,15 +146,39 @@ int2vectorin(PG_FUNCTION_ARGS)
 
 	result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS));
 
-	for (n = 0; *intString && n < FUNC_MAX_ARGS; n++)
+	for (n = 0; n < FUNC_MAX_ARGS; n++)
 	{
+		long		l;
+		char	   *endp;
+
 		while (*intString && isspace((unsigned char) *intString))
 			intString++;
 		if (*intString == '\0')
 			break;
-		result->values[n] = pg_atoi(intString, sizeof(int16), ' ');
-		while (*intString && !isspace((unsigned char) *intString))
-			intString++;
+
+		errno = 0;
+		l = strtol(intString, &endp, 10);
+
+		if (intString == endp)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s: \"%s\"",
+							"smallint", intString)));
+
+		if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
+			ereport(ERROR,
+					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+					 errmsg("value \"%s\" is out of range for type %s", intString,
+							"smallint")));
+
+		if (*endp && *endp != ' ')
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					 errmsg("invalid input syntax for type %s: \"%s\"",
+							"integer", intString)));
+
+		result->values[n] = l;
+		intString = endp;
 	}
 	while (*intString && isspace((unsigned char) *intString))
 		intString++;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index e82d23a325..cc3f95d399 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,94 +85,6 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
-/*
- * pg_atoi: convert string to integer
- *
- * allows any number of leading or trailing whitespace characters.
- *
- * 'size' is the sizeof() the desired integral result (1, 2, or 4 bytes).
- *
- * c, if not 0, is a terminator character that may appear after the
- * integer (plus whitespace).  If 0, the string must end after the integer.
- *
- * Unlike plain atoi(), this will throw ereport() upon bad input format or
- * overflow.
- */
-int32
-pg_atoi(const char *s, int size, int c)
-{
-	long		l;
-	char	   *badp;
-
-	/*
-	 * Some versions of strtol treat the empty string as an error, but some
-	 * seem not to.  Make an explicit test to be sure we catch it.
-	 */
-	if (s == NULL)
-		elog(ERROR, "NULL pointer");
-	if (*s == 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	errno = 0;
-	l = strtol(s, &badp, 10);
-
-	/* We made no progress parsing the string, so bail out */
-	if (s == badp)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	switch (size)
-	{
-		case sizeof(int32):
-			if (errno == ERANGE
-#if defined(HAVE_LONG_INT_64)
-			/* won't get ERANGE on these with 64-bit longs... */
-				|| l < INT_MIN || l > INT_MAX
-#endif
-				)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for type %s", s,
-								"integer")));
-			break;
-		case sizeof(int16):
-			if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for type %s", s,
-								"smallint")));
-			break;
-		case sizeof(int8):
-			if (errno == ERANGE || l < SCHAR_MIN || l > SCHAR_MAX)
-				ereport(ERROR,
-						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-						 errmsg("value \"%s\" is out of range for 8-bit integer", s)));
-			break;
-		default:
-			elog(ERROR, "unsupported result size: %d", size);
-	}
-
-	/*
-	 * Skip any trailing whitespace; if anything but whitespace remains before
-	 * the terminating character, bail out
-	 */
-	while (*badp && *badp != c && isspace((unsigned char) *badp))
-		badp++;
-
-	if (*badp && *badp != c)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"integer", s)));
-
-	return (int32) l;
-}
-
 /*
  * Convert input string to a signed 16 bit integer.
  *
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 191cc854a3..58abf4364a 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -43,7 +43,6 @@ extern void namestrcpy(Name name, const char *str);
 extern int	namestrcmp(Name name, const char *str);
 
 /* numutils.c */
-extern int32 pg_atoi(const char *s, int size, int c);
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
 extern int64 pg_strtoint64(const char *s);
-- 
2.34.1

v8-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchtext/plain; charset=UTF-8; name=v8-0004-Add-test-case-for-trailing-junk-after-numeric-lit.patchDownload

From 77ce3c647b6e70a93782f4600bfff7262ab611d4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 4/7] Add test case for trailing junk after numeric literals

PostgreSQL currently accepts numeric literals with trailing
non-digits, such as 123abc where the abc is treated as the next token.
This may be a bit surprising.  This commit adds test cases for this;
subsequent commits intend to change this behavior.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/test/regress/expected/numerology.out | 62 ++++++++++++++++++++++++
 src/test/regress/sql/numerology.sql      | 16 ++++++
 2 files changed, 78 insertions(+)

diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 44d6c435de..2ffc73e854 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -2,6 +2,68 @@
 -- NUMEROLOGY
 -- Test various combinations of numeric types and functions.
 --
+--
+-- Trailing junk in numeric literals
+--
+SELECT 123abc;
+ abc 
+-----
+ 123
+(1 row)
+
+SELECT 0x0o;
+ x0o 
+-----
+   0
+(1 row)
+
+SELECT 1_2_3;
+ _2_3 
+------
+    1
+(1 row)
+
+SELECT 0.a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT .0a;
+  a  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e1a;
+ a 
+---
+ 0
+(1 row)
+
+SELECT 0.0e;
+  e  
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e+a;
+ERROR:  syntax error at or near "+"
+LINE 1: SELECT 0.0e+a;
+                   ^
+PREPARE p1 AS SELECT $1a;
+EXECUTE p1(1);
+ a 
+---
+ 1
+(1 row)
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index fddb58f8fd..fb75f97832 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,6 +3,22 @@
 -- Test various combinations of numeric types and functions.
 --
 
+--
+-- Trailing junk in numeric literals
+--
+
+SELECT 123abc;
+SELECT 0x0o;
+SELECT 1_2_3;
+SELECT 0.a;
+SELECT 0.0a;
+SELECT .0a;
+SELECT 0.0e1a;
+SELECT 0.0e;
+SELECT 0.0e+a;
+PREPARE p1 AS SELECT $1a;
+EXECUTE p1(1);
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

v8-0005-Reject-trailing-junk-after-numeric-literals.patchtext/plain; charset=UTF-8; name=v8-0005-Reject-trailing-junk-after-numeric-literals.patchDownload

From 74d12688a2ff4bb37b06423ce6f9c7970347442e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 5/7] Reject trailing junk after numeric literals

After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc.  This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/scan.l                | 32 +++++++---
 src/fe_utils/psqlscan.l                  | 25 +++++---
 src/interfaces/ecpg/preproc/pgc.l        | 22 +++++++
 src/test/regress/expected/numerology.out | 77 +++++++++---------------
 src/test/regress/sql/numerology.sql      |  1 -
 5 files changed, 91 insertions(+), 66 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index f555ac6e6d..ab24bf70db 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -399,7 +399,12 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
+param_junk		\${integer}{ident_start}
 
 other			.
 
@@ -974,6 +979,10 @@ other			.
 					yylval->ival = atol(yytext + 1);
 					return PARAM;
 				}
+{param_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after parameter");
+				}
 
 {integer}		{
 					SET_YYLLOC();
@@ -996,19 +1005,24 @@ other			.
 					return FCONST;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 */
-					yyless(yyleng - 1);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("trailing junk after numeric literal");
+				}
+{integer_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{decimal_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{real_junk}		{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
 				}
 
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 941ed06553..0394edb15f 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -337,7 +337,12 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
+param_junk		\${integer}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -839,6 +844,9 @@ other			.
 {param}			{
 					ECHO;
 				}
+{param_junk}	{
+					ECHO;
+				}
 
 {integer}		{
 					ECHO;
@@ -855,17 +863,18 @@ other			.
 					ECHO;
 				}
 {realfail1}		{
-					/*
-					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
-					 * (in psql, we don't actually care...)
-					 */
-					yyless(yyleng - 1);
 					ECHO;
 				}
 {realfail2}		{
-					/* throw back the [Ee][+-], and proceed as above */
-					yyless(yyleng - 2);
+					ECHO;
+				}
+{integer_junk}	{
+					ECHO;
+				}
+{decimal_junk}	{
+					ECHO;
+				}
+{real_junk}		{
 					ECHO;
 				}
 
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 9286a0355d..a727a6b6ad 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,7 +365,12 @@ real			({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1		({integer}|{decimal})[Ee]
 realfail2		({integer}|{decimal})[Ee][-+]
 
+integer_junk	{integer}{ident_start}
+decimal_junk	{decimal}{ident_start}
+real_junk		{real}{ident_start}
+
 param			\${integer}
+param_junk		\${integer}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -917,6 +922,9 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 					base_yylval.ival = atol(yytext+1);
 					return PARAM;
 				}
+{param_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after parameter");
+				}
 
 {ip}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -957,6 +965,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 } /* <C,SQL> */
 
 <SQL>{
+/*
+ * Note that some trailing junk is valid in C (such as 100LL), so we contain
+ * this to SQL mode.
+ */
+{integer_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{decimal_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{real_junk}		{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+
 :{identifier}((("->"|\.){identifier})|(\[{array}\]))*	{
 					base_yylval.str = mm_strdup(yytext+1);
 					return CVARIABLE;
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 2ffc73e854..77d4843417 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,64 +6,45 @@
 -- Trailing junk in numeric literals
 --
 SELECT 123abc;
- abc 
------
- 123
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+               ^
 SELECT 0x0o;
- x0o 
------
-   0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+               ^
 SELECT 1_2_3;
- _2_3 
-------
-    1
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+               ^
 SELECT 0.a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+               ^
 SELECT 0.0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+               ^
 SELECT .0a;
-  a  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+               ^
 SELECT 0.0e1a;
- a 
----
- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+               ^
 SELECT 0.0e;
-  e  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+               ^
 SELECT 0.0e+a;
-ERROR:  syntax error at or near "+"
+ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
-                   ^
+               ^
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
- a 
----
- 1
-(1 row)
-
+ERROR:  trailing junk after parameter at or near "$1a"
+LINE 1: PREPARE p1 AS SELECT $1a;
+                             ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index fb75f97832..be7d6dfe0c 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -17,7 +17,6 @@
 SELECT 0.0e;
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
 
 --
 -- Test implicit type conversions
-- 
2.34.1

v8-0006-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v8-0006-Non-decimal-integer-literals.patchDownload

From 0132fb1da543b429b9001f1a682d21b1f510a3ef Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 6/7] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  26 ++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/scan.l                  | 101 +++++++++++----
 src/backend/utils/adt/numutils.c           | 140 +++++++++++++++++++++
 src/fe_utils/psqlscan.l                    |  80 +++++++++---
 src/interfaces/ecpg/preproc/pgc.l          | 116 +++++++++--------
 src/test/regress/expected/int2.out         |  19 +++
 src/test/regress/expected/int4.out         |  19 +++
 src/test/regress/expected/int8.out         |  19 +++
 src/test/regress/expected/numerology.out   |  59 ++++++++-
 src/test/regress/sql/int2.sql              |   7 ++
 src/test/regress/sql/int4.sql              |   7 ++
 src/test/regress/sql/int8.sql              |   7 ++
 src/test/regress/sql/numerology.sql        |  21 +++-
 15 files changed, 529 insertions(+), 99 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index b4f348a24d..1957fc6e2d 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index b8a78f4d41..545cb45131 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index ab24bf70db..2e1aa62d81 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,26 +385,41 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 other			.
 
@@ -984,20 +999,44 @@ other			.
 					yyerror("trailing junk after parameter");
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail}		{
+					SET_YYLLOC();
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					SET_YYLLOC();
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1012,11 +1051,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1312,17 +1363,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index cc3f95d399..37364921d5 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -120,6 +131,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -128,6 +181,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -196,6 +250,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -204,6 +300,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -280,6 +377,48 @@ pg_strtoint64(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -288,6 +427,7 @@ pg_strtoint64(const char *s)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0394edb15f..09155a3d5d 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,26 +323,41 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -848,13 +863,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -868,10 +901,19 @@ other			.
 {realfail2}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index a727a6b6ad..77cd321d31 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,26 +351,41 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1		({integer}|{decimal})[Ee]
-realfail2		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1		({decinteger}|{numeric})[Ee]
+realfail2		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -400,9 +415,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -415,7 +427,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -933,17 +945,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
-{decimal}		{
+{hexinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 16);
+				}
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -952,27 +967,43 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail1}		{
 					/*
 					 * throw back the [Ee], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 1);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {realfail2}		{
 					/* throw back the [Ee][+-], and proceed as above */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
 <SQL>{
-/*
- * Note that some trailing junk is valid in C (such as 100LL), so we contain
- * this to SQL mode.
- */
-{integer_junk}	{
+{octinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 8);
+				}
+{bininteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 2);
+				}
+
+	/*
+	 * Note that some trailing junk is valid in C (such as 100LL), so we contain
+	 * this to SQL mode.
+	 */
+{decinteger_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{hexinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{numeric_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
 {real_junk}		{
@@ -1033,19 +1064,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *)yytext,&endptr,16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1570,17 +1588,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 55ea7202cd..220e1493e8 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 9d20b3380f..6fdbd58b40 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,22 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 36540ec456..edd15a4353 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 77d4843417..d95b24c7b3 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,33 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -45,6 +64,42 @@ PREPARE p1 AS SELECT $1a;
 ERROR:  trailing junk after parameter at or near "$1a"
 LINE 1: PREPARE p1 AS SELECT $1a;
                              ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+               ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+               ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+               ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..0dee22fe6d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..2a69b1614e 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,10 @@ CREATE TABLE INT4_TBL(f1 int4);
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..b7ad696dd8 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index be7d6dfe0c..0e12bcc7b7 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,16 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -18,6 +24,19 @@
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

v8-0007-WIP-Underscores-in-numeric-literals.patchtext/plain; charset=UTF-8; name=v8-0007-WIP-Underscores-in-numeric-literals.patchDownload

From a47d14b8a03ea15eada3f2f464fc35a5bdb8bd5b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 7/7] WIP: Underscores in numeric literals

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 src/backend/parser/Makefile              |  2 +-
 src/backend/parser/scan.l                | 26 +++++++++++++++---
 src/test/regress/expected/numerology.out | 34 +++++++++++++++++++++---
 src/test/regress/sql/numerology.sql      |  7 ++++-
 4 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
index 5ddb9a92f0..827bc4c189 100644
--- a/src/backend/parser/Makefile
+++ b/src/backend/parser/Makefile
@@ -56,7 +56,7 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/s
 
 
 scan.c: FLEXFLAGS = -CF -p -p
-scan.c: FLEX_NO_BACKUP=yes
+#scan.c: FLEX_NO_BACKUP=yes
 scan.c: FLEX_FIX_WARNING=yes
 
 
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 2e1aa62d81..5b574c4233 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -395,10 +395,10 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
 hexfail			0[xX]
 octfail			0[oO]
@@ -1372,6 +1372,24 @@ process_integer_literal(const char *token, YYSTYPE *lval, int base)
 	int			val;
 	char	   *endptr;
 
+	if (strchr(token, '_'))
+	{
+		char	   *newtoken = palloc(strlen(token));
+		const char *p1;
+		char	   *p2;
+
+		p1 = token;
+		p2 = newtoken;
+		while (*p1)
+		{
+			if (*p1 != '_')
+				*p2++ = *p1;
+			p1++;
+		}
+		*p2 = '\0';
+		token = newtoken;
+	}
+
 	errno = 0;
 	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index d95b24c7b3..7289a325fc 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -23,6 +23,36 @@ SELECT 0x42F;
      1071
 (1 row)
 
+SELECT 1_000_000;
+ ?column? 
+----------
+  1000000
+(1 row)
+
+SELECT 1_2_3;
+ ?column? 
+----------
+      123
+(1 row)
+
+SELECT 0x1EEE_FFFF;
+ ?column?  
+-----------
+ 518979583
+(1 row)
+
+SELECT 0o2_73;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0b_10_0101;
+ ?column? 
+----------
+       37
+(1 row)
+
 -- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
@@ -32,10 +62,6 @@ SELECT 0x0o;
 ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
-SELECT 1_2_3;
-ERROR:  trailing junk after numeric literal at or near "1_"
-LINE 1: SELECT 1_2_3;
-               ^
 SELECT 0.a;
 ERROR:  trailing junk after numeric literal at or near "0.a"
 LINE 1: SELECT 0.a;
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index 0e12bcc7b7..f35ff31d9a 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -12,10 +12,15 @@
 SELECT 0o273;
 SELECT 0x42F;
 
+SELECT 1_000_000;
+SELECT 1_2_3;
+SELECT 0x1EEE_FFFF;
+SELECT 0o2_73;
+SELECT 0b_10_0101;
+
 -- error cases
 SELECT 123abc;
 SELECT 0x0o;
-SELECT 1_2_3;
 SELECT 0.a;
 SELECT 0.0a;
 SELECT .0a;
-- 
2.34.1

#19

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#18)

Re: Non-decimal integer literals

On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Rebased patch set

What if someone finds this new behavior too permissive?

--
Robert Haas
EDB: http://www.enterprisedb.com

#20

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Robert Haas (#19)

Re: Non-decimal integer literals

On 24.01.22 19:53, Robert Haas wrote:

On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Rebased patch set

What if someone finds this new behavior too permissive?

Which part exactly? There are several different changes proposed here.

#21

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#20)

Re: Non-decimal integer literals

On Tue, Jan 25, 2022 at 5:34 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 24.01.22 19:53, Robert Haas wrote:

On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Rebased patch set

What if someone finds this new behavior too permissive?

Which part exactly? There are several different changes proposed here.

I was just going based on the description of the feature in your
original post. If someone is hoping that int4in() will accept only
^\d+$ then they will be disappointed by this patch.

Maybe nobody is hoping that, though.

--
Robert Haas
EDB: http://www.enterprisedb.com

#22

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Peter Eisentraut (#18)

Re: Non-decimal integer literals

On 2022-Jan-24, Peter Eisentraut wrote:

+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+

I think there should be test cases for literals that these seemingly
strange expressions reject, which are a number with trailing _ (0x123_),
and one with consecutive underscores __ (0x12__34).

I like the idea of these literals. I would have found them useful on
many occassions.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/

#23

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Robert Haas (#21)

Re: Non-decimal integer literals

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 25, 2022 at 5:34 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Which part exactly? There are several different changes proposed here.

I was just going based on the description of the feature in your
original post. If someone is hoping that int4in() will accept only
^\d+$ then they will be disappointed by this patch.

Maybe I misunderstood, but I thought this was about what you could
write as a SQL literal, not about the I/O behavior of the integer
types. I'd be -0.1 on changing the latter.

regards, tom lane

#24

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Tom Lane (#23)

Re: Non-decimal integer literals

On 26.01.22 01:02, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 25, 2022 at 5:34 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Which part exactly? There are several different changes proposed here.

I was just going based on the description of the feature in your
original post. If someone is hoping that int4in() will accept only
^\d+$ then they will be disappointed by this patch.

Maybe I misunderstood, but I thought this was about what you could
write as a SQL literal, not about the I/O behavior of the integer
types. I'd be -0.1 on changing the latter.

I think it would be strange if I/O routines would accept a different
syntax from the literals. Also, the behavior of a cast from string/text
to a numeric type is usually defined in terms of what the literal syntax
is, so they need to be aligned.

#25

Andrew Dunstan

andrew@dunslane.net

almost 4 years ago

In reply to: Alvaro Herrera (#22)

Re: Non-decimal integer literals

On 1/25/22 13:43, Alvaro Herrera wrote:

On 2022-Jan-24, Peter Eisentraut wrote:
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
I think there should be test cases for literals that these seemingly
strange expressions reject, which are a number with trailing _ (0x123_),
and one with consecutive underscores __ (0x12__34).

I like the idea of these literals. I would have found them useful on
many occassions.

+1. I can't remember the number of times I have miscounted a long string
of digits in a literal.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#26

John Naylor

john.naylor@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#24)

1 attachment(s)

Re: Non-decimal integer literals

On Wed, Jan 26, 2022 at 10:10 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

[v8 patch]

0001-0004 seem pretty straightforward.

0005:

 {realfail1} {
- /*
- * throw back the [Ee], and figure out whether what
- * remains is an {integer} or {decimal}.
- */
- yyless(yyleng - 1);
  SET_YYLLOC();
- return process_integer_literal(yytext, yylval);
+ yyerror("trailing junk after numeric literal");
  }

realfail1 has been subsumed by integer_junk and decimal_junk, so that
pattern can be removed.

 <SQL>{
+/*
+ * Note that some trailing junk is valid in C (such as 100LL), so we contain
+ * this to SQL mode.
+ */

It seems Flex doesn't like C comments after the "%%", so this stanza
was indented in 0006. If these are to be committed separately, that
fix should happen here.

0006:

Minor nit -- the s/decimal/numeric/ change doesn't seem to have any
notational advantage, but it's not worse, either.

0007:

I've attached an addendum to restore the no-backtrack property.

Will the underscore syntax need treatment in the input routines as well?

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v8-0007-addendum-restore-nobackup.txttext/plain; charset=US-ASCII; name=v8-0007-addendum-restore-nobackup.txtDownload

diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
index 827bc4c189..5ddb9a92f0 100644
--- a/src/backend/parser/Makefile
+++ b/src/backend/parser/Makefile
@@ -56,7 +56,7 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/s
 
 
 scan.c: FLEXFLAGS = -CF -p -p
-#scan.c: FLEX_NO_BACKUP=yes
+scan.c: FLEX_NO_BACKUP=yes
 scan.c: FLEX_FIX_WARNING=yes
 
 
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 5b574c4233..3b311ac2dd 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -400,9 +400,9 @@ hexinteger		0[xX](_?{hexdigit})+
 octinteger		0[oO](_?{octdigit})+
 bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.

#27

Christoph Berg

myon@debian.org

almost 4 years ago

In reply to: Peter Eisentraut (#1)

Re: Non-decimal integer literals

Re: Peter Eisentraut

This adds support in the lexer as well as in the integer type input
functions.

Those core parts are straightforward enough, but there are a bunch of other
places where integers are parsed, and one could consider in each case
whether they should get the same treatment, for example the replication
syntax lexer, or input function for oid, numeric, and int2vector.

One thing I always found weird is that timeline IDs appear most
prominently as hex numbers in WAL filenames, but they are printed as
decimal in the log ("new timeline id nn"), and have to be specified as
decimal in recovery_target_timeline.

Perhaps both these could make use of 0xhex numbers as well.

Christoph

#28

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: John Naylor (#26)

Re: Non-decimal integer literals

On 13.02.22 13:14, John Naylor wrote:

On Wed, Jan 26, 2022 at 10:10 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

[v8 patch]

0001-0004 seem pretty straightforward.

These have been committed.

0005:

{realfail1} {
- /*
- * throw back the [Ee], and figure out whether what
- * remains is an {integer} or {decimal}.
- */
- yyless(yyleng - 1);
SET_YYLLOC();
- return process_integer_literal(yytext, yylval);
+ yyerror("trailing junk after numeric literal");
}

realfail1 has been subsumed by integer_junk and decimal_junk, so that
pattern can be removed.

Committed with that change.

I found that the JSON path lexer has the same trailing-junk issue. I
have researched the relevant ECMA standard and it explicitly points out
that this is not allowed. I will look into that separately. I'm just
pointing that out here because grepping for "realfail1" will still show
a hit after this.

The remaining patches are material for PG16 at this point, and I will
set the commit fest item to returned with feedback in the meantime.

0006:

Minor nit -- the s/decimal/numeric/ change doesn't seem to have any
notational advantage, but it's not worse, either.

I did that because with the addition of non-decimal literals, the word
"decimal" becomes ambiguous or misleading. (It doesn't mean "uses
decimal digits" but "has a decimal point".) (Of course, "numeric" isn't
entirely free of ambiguity, but there are only so many words available
in this space. ;-) )

0007:

I've attached an addendum to restore the no-backtrack property.

Thanks, that is helpful.

Will the underscore syntax need treatment in the input routines as well?

Yeah, various additional work is required for this.

#29

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Peter Eisentraut (#28)

1 attachment(s)

Re: Non-decimal integer literals

On 16.02.22 11:11, Peter Eisentraut wrote:

The remaining patches are material for PG16 at this point, and I will
set the commit fest item to returned with feedback in the meantime.

Time to continue with this.

Attached is a rebased and cleaned up patch for non-decimal integer
literals. (I don't include the underscores-in-numeric literals patch.
I'm keeping that for later.)

Two open issues from my notes:

Technically, numeric_in() should be made aware of this, but that seems
relatively complicated and maybe not necessary for the first iteration.

Taking another look around ecpg to see how this interacts with C-syntax
integer literals. I'm not aware of any particular issues, but it's
understandably tricky.

Other than that, this seems pretty complete as a start.

Attachments:

v9-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v9-0001-Non-decimal-integer-literals.patchDownload

From d0bc72fa4c339ba2ea0bb8d1e5a3923d76ee8105 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 10 Oct 2022 16:03:15 +0200
Subject: [PATCH v9] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  26 ++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/scan.l                  |  99 +++++++++++----
 src/backend/utils/adt/numutils.c           | 140 +++++++++++++++++++++
 src/fe_utils/psqlscan.l                    |  78 +++++++++---
 src/interfaces/ecpg/preproc/pgc.l          | 108 +++++++++-------
 src/test/regress/expected/int2.out         |  19 +++
 src/test/regress/expected/int4.out         |  19 +++
 src/test/regress/expected/int8.out         |  19 +++
 src/test/regress/expected/numerology.out   |  59 ++++++++-
 src/test/regress/sql/int2.sql              |   7 ++
 src/test/regress/sql/int4.sql              |   7 ++
 src/test/regress/sql/int8.sql              |   7 ++
 src/test/regress/sql/numerology.sql        |  21 +++-
 15 files changed, 523 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..bba78c22f1a9 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index da7c9c772e09..e897e28ed148 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8ebcc..8bb9d5fcc52d 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,25 +385,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 other			.
 
@@ -983,20 +998,44 @@ other			.
 					yyerror("trailing junk after parameter");
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext + 2, yylval, 2);
+				}
+{hexfail}		{
+					SET_YYLLOC();
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					SET_YYLLOC();
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1007,11 +1046,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1307,17 +1358,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 834ec0b5882c..55f0a20839db 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -120,6 +131,48 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -128,6 +181,7 @@ pg_strtoint16(const char *s)
 			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -196,6 +250,48 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -204,6 +300,7 @@ pg_strtoint32(const char *s)
 			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -280,6 +377,48 @@ pg_strtoint64(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+	{
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
 	while (*ptr && isdigit((unsigned char) *ptr))
 	{
 		int8		digit = (*ptr++ - '0');
@@ -288,6 +427,7 @@ pg_strtoint64(const char *s)
 			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 			goto out_of_range;
 	}
+	}
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec24077..cb1fc5213844 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,25 +323,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -847,13 +862,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -864,10 +897,19 @@ other			.
 {realfail}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index c145c9698f1a..b6ada6ef1f5e 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char ychar);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,25 +351,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -399,9 +414,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -414,7 +426,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -932,17 +944,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
-{decimal}		{
+{hexinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 16);
+				}
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -951,22 +966,38 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail}		{
 					/*
 					 * throw back the [Ee][+-], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
 <SQL>{
+{octinteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 8);
+				}
+{bininteger}	{
+					return process_integer_literal(yytext + 2, &base_yylval, 2);
+				}
+
 	/*
-	 * Note that some trailing junk is valid in C (such as 100LL), so we
-	 * contain this to SQL mode.
+	 * Note that some trailing junk is valid in C (such as 100LL), so we contain
+	 * this to SQL mode.
 	 */
-{integer_junk}	{
+{decinteger_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{hexinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{numeric_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
 {real_junk}		{
@@ -1036,19 +1067,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *) yytext, &endptr, 16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1573,17 +1591,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
+ * Process {*integer}.  Note this will also do the right thing with {numeric},
  * ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(token, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 109cf9baaaca..39bd2684ad3d 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -304,3 +304,22 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index fbcc0e8d9e68..6ec54835b4aa 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -431,3 +431,22 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 1ae23cf3f94f..765df1de52ae 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -927,3 +927,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 77d48434173b..d95b24c7b329 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,33 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -45,6 +64,42 @@ PREPARE p1 AS SELECT $1a;
 ERROR:  trailing junk after parameter at or near "$1a"
 LINE 1: PREPARE p1 AS SELECT $1a;
                              ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+               ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+               ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+               ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index ea29066b78ee..405617fa7829 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -104,3 +104,10 @@
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index f19077f3da21..1843b718a705 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -164,3 +164,10 @@
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 38b771964d79..dd82040b0b37 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -245,3 +245,10 @@
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index be7d6dfe0c26..0e12bcc7b709 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,16 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -18,6 +24,19 @@
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)

base-commit: 06dbd619bfbfe03fefa7223838690d4012f874ad
-- 
2.37.3

#30

Junwang Zhao

zhjwpku@gmail.com

over 3 years ago

In reply to: Peter Eisentraut (#29)

Re: Non-decimal integer literals

Hi Peter,

  /* process digits */
+ if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+ {
+ ptr += 2;
+ while (*ptr && isxdigit((unsigned char) *ptr))
+ {
+ int8 digit = hexlookup[(unsigned char) *ptr];
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+
+ ptr++;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else
+ {
  while (*ptr && isdigit((unsigned char) *ptr))
  {
  int8 digit = (*ptr++ - '0');
@@ -128,6 +181,7 @@ pg_strtoint16(const char *s)
  unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
  goto out_of_range;
  }
+ }

What do you think if we move these code into a static inline function? like:

static inline char*
process_digits(char *ptr, int32 *result)
{
...
}

On Mon, Oct 10, 2022 at 10:17 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 16.02.22 11:11, Peter Eisentraut wrote:

The remaining patches are material for PG16 at this point, and I will
set the commit fest item to returned with feedback in the meantime.

Time to continue with this.

Attached is a rebased and cleaned up patch for non-decimal integer
literals. (I don't include the underscores-in-numeric literals patch.
I'm keeping that for later.)

Two open issues from my notes:

Technically, numeric_in() should be made aware of this, but that seems
relatively complicated and maybe not necessary for the first iteration.

Taking another look around ecpg to see how this interacts with C-syntax
integer literals. I'm not aware of any particular issues, but it's
understandably tricky.

Other than that, this seems pretty complete as a start.

--
Regards
Junwang Zhao

#31

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Junwang Zhao (#30)

Re: Non-decimal integer literals

On 11.10.22 05:29, Junwang Zhao wrote:

What do you think if we move these code into a static inline function? like:

static inline char*
process_digits(char *ptr, int32 *result)
{
...
}

How would you handle the different ways each branch checks for valid
digits and computes the value of each digit?

#32

Junwang Zhao

zhjwpku@gmail.com

over 3 years ago

In reply to: Peter Eisentraut (#31)

Re: Non-decimal integer literals

On Tue, Oct 11, 2022 at 4:59 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 11.10.22 05:29, Junwang Zhao wrote:

What do you think if we move these code into a static inline function? like:

static inline char*
process_digits(char *ptr, int32 *result)
{
...
}

How would you handle the different ways each branch checks for valid
digits and computes the value of each digit?

Didn't notice that, sorry for the noise ;(

--
Regards
Junwang Zhao

#33

John Naylor

john.naylor@enterprisedb.com

about 3 years ago

In reply to: Peter Eisentraut (#29)

Re: Non-decimal integer literals

On Mon, Oct 10, 2022 at 9:17 PM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

Taking another look around ecpg to see how this interacts with C-syntax
integer literals. I'm not aware of any particular issues, but it's
understandably tricky.

I can find no discussion in the archives about the commit that added "xch":

commit 6fb3c3f78fbb2296894424f6e3183d339915eac7
Author: Michael Meskes <meskes@postgresql.org>
Date: Fri Oct 15 19:02:08 1999 +0000

*** empty log message ***

...and I can't think of why bounds checking a C literal was done like this.

Regarding the patch, it looks good overall. My only suggestion would be to
add a regression test for just below and just above overflow, at least for
int2.

Minor nits:

- * Process {integer}.  Note this will also do the right thing with
{decimal},
+ * Process {*integer}.  Note this will also do the right thing with
{numeric},

I scratched my head for a while, thinking this was Flex syntax, until I
realized my brain was supposed to do shell-globbing first, at which point I
could see it was referring to multiple Flex rules. I'd try to rephrase.

+T661 Non-decimal integer literals YES SQL:202x draft

Is there an ETA yet?

Also, it's not this patch's job to do it, but there are at least a half
dozen places that open-code turning a hex char into a number. It might be a
good easy "todo item" to unify that.

--
John Naylor
EDB: http://www.enterprisedb.com

#34

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: John Naylor (#33)

Re: Non-decimal integer literals

On 14.11.22 08:25, John Naylor wrote:

Regarding the patch, it looks good overall. My only suggestion would be
to add a regression test for just below and just above overflow, at
least for int2.

Minor nits:
- * Process {integer}.  Note this will also do the right thing with 
{decimal},
+ * Process {*integer}.  Note this will also do the right thing with 
{numeric},
I scratched my head for a while, thinking this was Flex syntax, until I
realized my brain was supposed to do shell-globbing first, at which
point I could see it was referring to multiple Flex rules. I'd try to
rephrase.

+T661 Non-decimal integer literals YES SQL:202x draft

Is there an ETA yet?

March 2023

Also, it's not this patch's job to do it, but there are at least a half
dozen places that open-code turning a hex char into a number. It might
be a good easy "todo item" to unify that.

right

#35

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: Peter Eisentraut (#34)

1 attachment(s)

Re: Non-decimal integer literals

On 15.11.22 11:31, Peter Eisentraut wrote:

On 14.11.22 08:25, John Naylor wrote:

Regarding the patch, it looks good overall. My only suggestion would
be to add a regression test for just below and just above overflow, at
least for int2.

ok

This was a valuable suggestion, because this found some breakage. In
particular, the handling of grammar-level literals that overflow to
"Float" was not correct. (The radix prefix was simply stripped and
forgotten.) So I added a bunch more tests for this. Here is a new patch.

Attachments:

v10-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v10-0001-Non-decimal-integer-literals.patchDownload

From c0daab31eb145fbe54c2822bc093d774b993cd3d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 22 Nov 2022 14:22:09 +0100
Subject: [PATCH v10] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  34 +++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/parse_node.c            |  24 ++-
 src/backend/parser/scan.l                  | 101 +++++++++---
 src/backend/utils/adt/numutils.c           | 170 +++++++++++++++++++--
 src/fe_utils/psqlscan.l                    |  78 +++++++---
 src/interfaces/ecpg/preproc/pgc.l          | 106 +++++++------
 src/test/regress/expected/int2.out         |  80 ++++++++++
 src/test/regress/expected/int4.out         |  80 ++++++++++
 src/test/regress/expected/int8.out         |  80 ++++++++++
 src/test/regress/expected/numerology.out   | 127 ++++++++++++++-
 src/test/regress/sql/int2.sql              |  22 +++
 src/test/regress/sql/int4.sql              |  22 +++
 src/test/regress/sql/int8.sql              |  22 +++
 src/test/regress/sql/numerology.sql        |  37 ++++-
 16 files changed, 881 insertions(+), 109 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..956182e7c6a8 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
+    <note>
+     <para>
+      Nondecimal integer constants are currently only supported in the range
+      of the <type>bigint</type> type (see <xref
+      linkend="datatype-numeric-table"/>).
+     </para>
+    </note>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index da7c9c772e09..e897e28ed148 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 4014db4b80f9..c7085a01b7f5 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -385,11 +385,33 @@ make_const(ParseState *pstate, A_Const *aconst)
 			{
 				/* could be an oversize integer as well as a float ... */
 
+				int			base = 10;
+				const char *startptr;
 				int64		val64;
 				char	   *endptr;
 
+				startptr = aconst->val.fval.fval;
+				if (startptr[0] == '0')
+				{
+					if (startptr[1] == 'b' || startptr[1] == 'B')
+					{
+						base = 2;
+						startptr += 2;
+					}
+					else if (startptr[1] == 'o' || startptr[1] == 'O')
+					{
+						base = 8;
+						startptr += 2;
+					}
+					if (startptr[1] == 'x' || startptr[1] == 'X')
+					{
+						base = 16;
+						startptr += 2;
+					}
+				}
+
 				errno = 0;
-				val64 = strtoi64(aconst->val.fval.fval, &endptr, 10);
+				val64 = strtoi64(startptr, &endptr, base);
 				if (errno == 0 && *endptr == '\0')
 				{
 					/*
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8ebcc..9ad9e0c8ba74 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,25 +385,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 other			.
 
@@ -983,20 +998,44 @@ other			.
 					yyerror("trailing junk after parameter");
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 2);
+				}
+{hexfail}		{
+					SET_YYLLOC();
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					SET_YYLLOC();
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1007,11 +1046,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1307,17 +1358,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
- * ie digits and a decimal point.
+ * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
+ * thing with {numeric}, ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 834ec0b5882c..2942b7c44904 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -120,13 +131,56 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		int8		digit = (*ptr++ - '0');
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
 
-		if (unlikely(pg_mul_s16_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 10, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
 	}
 
 	/* allow trailing whitespace, but not other trailing chars */
@@ -196,13 +250,56 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		int8		digit = (*ptr++ - '0');
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
 
-		if (unlikely(pg_mul_s32_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 10, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
 	}
 
 	/* allow trailing whitespace, but not other trailing chars */
@@ -280,13 +377,56 @@ pg_strtoint64(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		int8		digit = (*ptr++ - '0');
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
 
-		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
 	}
 
 	/* allow trailing whitespace, but not other trailing chars */
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec24077..cb1fc5213844 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,25 +323,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -847,13 +862,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -864,10 +897,19 @@ other			.
 {realfail}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index c145c9698f1a..2c09c6cb4f35 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char ychar);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,25 +351,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -399,9 +414,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -414,7 +426,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -932,17 +944,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
-{decimal}		{
+{hexinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 16);
+				}
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -951,22 +966,38 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail}		{
 					/*
 					 * throw back the [Ee][+-], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
 <SQL>{
+{octinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 8);
+				}
+{bininteger}	{
+					return process_integer_literal(yytext, &base_yylval, 2);
+				}
+
 	/*
 	 * Note that some trailing junk is valid in C (such as 100LL), so we
 	 * contain this to SQL mode.
 	 */
-{integer_junk}	{
+{decinteger_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{hexinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{numeric_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
 {real_junk}		{
@@ -1036,19 +1067,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *) yytext, &endptr, 16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1573,17 +1591,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
- * ie digits and a decimal point.
+ * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
+ * thing with {numeric}, ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 109cf9baaaca..37cbd419fa40 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -304,3 +304,83 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
+-- cases near overflow
+SELECT int2 '0b111111111111111';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0b1000000000000000';
+ERROR:  value "0b1000000000000000" is out of range for type smallint
+LINE 1: SELECT int2 '0b1000000000000000';
+                    ^
+SELECT int2 '0o77777';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0o100000';
+ERROR:  value "0o100000" is out of range for type smallint
+LINE 1: SELECT int2 '0o100000';
+                    ^
+SELECT int2 '0x7FFF';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0x8000';
+ERROR:  value "0x8000" is out of range for type smallint
+LINE 1: SELECT int2 '0x8000';
+                    ^
+SELECT int2 '-0b1000000000000000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0b1000000000000001';
+ERROR:  value "-0b1000000000000001" is out of range for type smallint
+LINE 1: SELECT int2 '-0b1000000000000001';
+                    ^
+SELECT int2 '-0o100000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0o100001';
+ERROR:  value "-0o100001" is out of range for type smallint
+LINE 1: SELECT int2 '-0o100001';
+                    ^
+SELECT int2 '-0x8000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0x8001';
+ERROR:  value "-0x8001" is out of range for type smallint
+LINE 1: SELECT int2 '-0x8001';
+                    ^
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index fbcc0e8d9e68..718fa3efc902 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -431,3 +431,83 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
+-- cases near overflow
+SELECT int4 '0b1111111111111111111111111111111';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0b10000000000000000000000000000000';
+ERROR:  value "0b10000000000000000000000000000000" is out of range for type integer
+LINE 1: SELECT int4 '0b10000000000000000000000000000000';
+                    ^
+SELECT int4 '0o17777777777';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0o20000000000';
+ERROR:  value "0o20000000000" is out of range for type integer
+LINE 1: SELECT int4 '0o20000000000';
+                    ^
+SELECT int4 '0x7FFFFFFF';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0x80000000';
+ERROR:  value "0x80000000" is out of range for type integer
+LINE 1: SELECT int4 '0x80000000';
+                    ^
+SELECT int4 '-0b10000000000000000000000000000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0b10000000000000000000000000000001';
+ERROR:  value "-0b10000000000000000000000000000001" is out of range for type integer
+LINE 1: SELECT int4 '-0b10000000000000000000000000000001';
+                    ^
+SELECT int4 '-0o20000000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0o20000000001';
+ERROR:  value "-0o20000000001" is out of range for type integer
+LINE 1: SELECT int4 '-0o20000000001';
+                    ^
+SELECT int4 '-0x80000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0x80000001';
+ERROR:  value "-0x80000001" is out of range for type integer
+LINE 1: SELECT int4 '-0x80000001';
+                    ^
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 1ae23cf3f94f..ab35b53cc4bd 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -927,3 +927,83 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
+-- cases near overflow
+SELECT int8 '0b111111111111111111111111111111111111111111111111111111111111111';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0b1000000000000000000000000000000000000000000000000000000000000000';
+ERROR:  value "0b1000000000000000000000000000000000000000000000000000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0b100000000000000000000000000000000000000000000...
+                    ^
+SELECT int8 '0o777777777777777777777';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0o1000000000000000000000';
+ERROR:  value "0o1000000000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0o1000000000000000000000';
+                    ^
+SELECT int8 '0x7FFFFFFFFFFFFFFF';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0x8000000000000000';
+ERROR:  value "0x8000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0x8000000000000000';
+                    ^
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000001';
+ERROR:  value "-0b1000000000000000000000000000000000000000000000000000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0b10000000000000000000000000000000000000000000...
+                    ^
+SELECT int8 '-0o1000000000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0o1000000000000000000001';
+ERROR:  value "-0o1000000000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0o1000000000000000000001';
+                    ^
+SELECT int8 '-0x8000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0x8000000000000001';
+ERROR:  value "-0x8000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0x8000000000000001';
+                    ^
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 77d48434173b..162c7c6f72b9 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,101 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- cases near int4 overflow
+SELECT 0b1111111111111111111111111111111;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0b10000000000000000000000000000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT 0o17777777777;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0o20000000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT 0x7FFFFFFF;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0x80000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+-- cases near int8 overflow
+SELECT 0b111111111111111111111111111111111111111111111111111111111111111;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0b1000000000000000000000000000000000000000000000000000000000000000;
+ERROR:  invalid input syntax for type numeric: "0b1000000000000000000000000000000000000000000000000000000000000000"
+LINE 1: SELECT 0b100000000000000000000000000000000000000000000000000...
+               ^
+SELECT 0o777777777777777777777;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0o1000000000000000000000;
+ERROR:  invalid input syntax for type numeric: "0o1000000000000000000000"
+LINE 1: SELECT 0o1000000000000000000000;
+               ^
+SELECT 0x7FFFFFFFFFFFFFFF;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0x8000000000000000;
+ERROR:  invalid input syntax for type numeric: "0x8000000000000000"
+LINE 1: SELECT 0x8000000000000000;
+               ^
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -45,6 +132,42 @@ PREPARE p1 AS SELECT $1a;
 ERROR:  trailing junk after parameter at or near "$1a"
 LINE 1: PREPARE p1 AS SELECT $1a;
                              ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+               ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+               ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+               ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index ea29066b78ee..9809e87d52f2 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -104,3 +104,25 @@
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
+
+-- cases near overflow
+SELECT int2 '0b111111111111111';
+SELECT int2 '0b1000000000000000';
+SELECT int2 '0o77777';
+SELECT int2 '0o100000';
+SELECT int2 '0x7FFF';
+SELECT int2 '0x8000';
+
+SELECT int2 '-0b1000000000000000';
+SELECT int2 '-0b1000000000000001';
+SELECT int2 '-0o100000';
+SELECT int2 '-0o100001';
+SELECT int2 '-0x8000';
+SELECT int2 '-0x8001';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index f19077f3da21..e704dee18a2f 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -164,3 +164,25 @@
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
+
+-- cases near overflow
+SELECT int4 '0b1111111111111111111111111111111';
+SELECT int4 '0b10000000000000000000000000000000';
+SELECT int4 '0o17777777777';
+SELECT int4 '0o20000000000';
+SELECT int4 '0x7FFFFFFF';
+SELECT int4 '0x80000000';
+
+SELECT int4 '-0b10000000000000000000000000000000';
+SELECT int4 '-0b10000000000000000000000000000001';
+SELECT int4 '-0o20000000000';
+SELECT int4 '-0o20000000001';
+SELECT int4 '-0x80000000';
+SELECT int4 '-0x80000001';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 38b771964d79..0a567a81c175 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -245,3 +245,25 @@
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
+
+-- cases near overflow
+SELECT int8 '0b111111111111111111111111111111111111111111111111111111111111111';
+SELECT int8 '0b1000000000000000000000000000000000000000000000000000000000000000';
+SELECT int8 '0o777777777777777777777';
+SELECT int8 '0o1000000000000000000000';
+SELECT int8 '0x7FFFFFFFFFFFFFFF';
+SELECT int8 '0x8000000000000000';
+
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000000';
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000001';
+SELECT int8 '-0o1000000000000000000000';
+SELECT int8 '-0o1000000000000000000001';
+SELECT int8 '-0x8000000000000000';
+SELECT int8 '-0x8000000000000001';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index be7d6dfe0c26..22a671e34ae6 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,32 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- cases near int4 overflow
+SELECT 0b1111111111111111111111111111111;
+SELECT 0b10000000000000000000000000000000;
+SELECT 0o17777777777;
+SELECT 0o20000000000;
+SELECT 0x7FFFFFFF;
+SELECT 0x80000000;
+
+-- cases near int8 overflow
+SELECT 0b111111111111111111111111111111111111111111111111111111111111111;
+SELECT 0b1000000000000000000000000000000000000000000000000000000000000000;
+SELECT 0o777777777777777777777;
+SELECT 0o1000000000000000000000;
+SELECT 0x7FFFFFFFFFFFFFFF;
+SELECT 0x8000000000000000;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -18,6 +40,19 @@
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.38.1

#36

John Naylor

john.naylor@enterprisedb.com

about 3 years ago

In reply to: Peter Eisentraut (#35)

Re: Non-decimal integer literals

On Tue, Nov 22, 2022 at 8:36 PM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 15.11.22 11:31, Peter Eisentraut wrote:

On 14.11.22 08:25, John Naylor wrote:

Regarding the patch, it looks good overall. My only suggestion would
be to add a regression test for just below and just above overflow, at
least for int2.

ok

This was a valuable suggestion, because this found some breakage. In
particular, the handling of grammar-level literals that overflow to
"Float" was not correct. (The radix prefix was simply stripped and
forgotten.) So I added a bunch more tests for this. Here is a new patch.

Looks good to me.

--
John Naylor
EDB: http://www.enterprisedb.com

#37

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: Peter Eisentraut (#35)

Re: Non-decimal integer literals

On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Here is a new patch.

This looks like quite an inefficient way to convert a hex string into an int64:

while (*ptr && isxdigit((unsigned char) *ptr))
{
int8 digit = hexlookup[(unsigned char) *ptr];

if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
goto out_of_range;

ptr++;
}

I wonder if you'd be better off with something like:

while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp & UINT64CONST(0xF000000000000000)))
goto out_of_range;

tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}

Going by [1]https://godbolt.org/z/jz6Th6jnM, clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc. It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

David

[1]: https://godbolt.org/z/jz6Th6jnM

#38

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: David Rowley (#37)

1 attachment(s)

Re: Non-decimal integer literals

On Wed, 23 Nov 2022 at 21:54, David Rowley <dgrowleyml@gmail.com> wrote:

I wonder if you'd be better off with something like:

while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp & UINT64CONST(0xF000000000000000)))
goto out_of_range;

tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}

Here's a delta diff with it changed to work that way.

David

Attachments:

more_efficient_hex_oct_and_binary_processing.difftext/plain; charset=US-ASCII; name=more_efficient_hex_oct_and_binary_processing.diffDownload

diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 2942b7c449..ce305b611d 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -136,13 +136,10 @@ pg_strtoint16(const char *s)
 		ptr += 2;
 		while (*ptr && isxdigit((unsigned char) *ptr))
 		{
-			int8		digit = hexlookup[(unsigned char) *ptr];
-
-			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
-				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & 0xF000))
 				goto out_of_range;
 
-			ptr++;
+			tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
@@ -151,11 +148,10 @@ pg_strtoint16(const char *s)
 
 		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
 		{
-			int8		digit = (*ptr++ - '0');
-
-			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
-				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & 0xE000))
 				goto out_of_range;
+
+			tmp = (tmp << 3) | (*ptr++ - '0');
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
@@ -164,11 +160,10 @@ pg_strtoint16(const char *s)
 
 		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
 		{
-			int8		digit = (*ptr++ - '0');
-
-			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
-				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & 0x8000))
 				goto out_of_range;
+
+			tmp = (tmp << 1) | (*ptr++ - '0');
 		}
 	}
 	else
@@ -255,13 +250,10 @@ pg_strtoint32(const char *s)
 		ptr += 2;
 		while (*ptr && isxdigit((unsigned char) *ptr))
 		{
-			int8		digit = hexlookup[(unsigned char) *ptr];
-
-			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
-				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & 0xF0000000))
 				goto out_of_range;
 
-			ptr++;
+			tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
@@ -270,11 +262,10 @@ pg_strtoint32(const char *s)
 
 		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
 		{
-			int8		digit = (*ptr++ - '0');
-
-			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
-				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & 0xE0000000))
 				goto out_of_range;
+
+			tmp = (tmp << 3) | (*ptr++ - '0');
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
@@ -283,11 +274,10 @@ pg_strtoint32(const char *s)
 
 		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
 		{
-			int8		digit = (*ptr++ - '0');
-
-			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
-				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & 0x80000000))
 				goto out_of_range;
+
+			tmp = (tmp << 1) | (*ptr++ - '0');
 		}
 	}
 	else
@@ -382,13 +372,10 @@ pg_strtoint64(const char *s)
 		ptr += 2;
 		while (*ptr && isxdigit((unsigned char) *ptr))
 		{
-			int8		digit = hexlookup[(unsigned char) *ptr];
-
-			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
-				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & UINT64CONST(0xF000000000000000)))
 				goto out_of_range;
 
-			ptr++;
+			tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
@@ -397,11 +384,10 @@ pg_strtoint64(const char *s)
 
 		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
 		{
-			int8		digit = (*ptr++ - '0');
-
-			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
-				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & UINT64CONST(0xE000000000000000)))
 				goto out_of_range;
+
+			tmp = (tmp << 3) | (*ptr++ - '0');
 		}
 	}
 	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
@@ -410,11 +396,10 @@ pg_strtoint64(const char *s)
 
 		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
 		{
-			int8		digit = (*ptr++ - '0');
-
-			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
-				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+			if (unlikely(tmp & UINT64CONST(0x8000000000000000)))
 				goto out_of_range;
+
+			tmp = (tmp << 1) | (*ptr++ - '0');
 		}
 	}
 	else

#39

John Naylor

john.naylor@enterprisedb.com

about 3 years ago

In reply to: David Rowley (#37)

Re: Non-decimal integer literals

On Wed, Nov 23, 2022 at 3:54 PM David Rowley <dgrowleyml@gmail.com> wrote:

Going by [1], clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc. It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

FWIW, gcc 12.2 generates an imul on my system when compiling in situ. I've
found it useful to run godbolt locally* and load the entire PG file (nicer
to read than plain objdump) -- compilers can make different decisions when
going from isolated snippets to within full functions.

* clone from https://github.com/compiler-explorer/compiler-explorer
install npm 16
run "make" and when finished will show the localhost url
add the right flags, which in this case was

-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Werror=vla -Wendif-labels -Wmissing-format-attribute
-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security
-fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation -Wno-stringop-truncation -O2
-I/path/to/srcdir/src/include -I/path/to/builddir/src/include -D_GNU_SOURCE

--
John Naylor
EDB: http://www.enterprisedb.com

#40

Dean Rasheed

dean.a.rasheed@gmail.com

about 3 years ago

In reply to: Peter Eisentraut (#35)

Re: Non-decimal integer literals

On Tue, 22 Nov 2022 at 13:37, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 15.11.22 11:31, Peter Eisentraut wrote:

On 14.11.22 08:25, John Naylor wrote:

Regarding the patch, it looks good overall. My only suggestion would
be to add a regression test for just below and just above overflow, at
least for int2.

This was a valuable suggestion, because this found some breakage. In
particular, the handling of grammar-level literals that overflow to
"Float" was not correct. (The radix prefix was simply stripped and
forgotten.) So I added a bunch more tests for this. Here is a new patch.

Taking a quick look, I noticed that there are no tests for negative
values handled in the parser.

Giving that a spin shows that make_const() fails to correctly identify
the base of negative non-decimal integers in the T_Float case, causing
it to fall through to numeric_in() and fail:

SELECT -0x80000000;

ERROR: invalid input syntax for type numeric: "-0x80000000"
^
Regards,
Dean

#41

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: David Rowley (#37)

Re: Non-decimal integer literals

On 23.11.22 09:54, David Rowley wrote:

On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Here is a new patch.

This looks like quite an inefficient way to convert a hex string into an int64:

while (*ptr && isxdigit((unsigned char) *ptr))
{
int8 digit = hexlookup[(unsigned char) *ptr];

if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
goto out_of_range;

ptr++;
}

I wonder if you'd be better off with something like:

while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp & UINT64CONST(0xF000000000000000)))
goto out_of_range;

tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}

Going by [1], clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc. It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

My code follows the style used for parsing the decimal integers.
Keeping that consistent is valuable I think. I think the proposed
change makes the code significantly harder to understand. Also, what
you are suggesting here would amount to an attempt to make parsing
hexadecimal integers even faster than parsing decimal integers. Is that
useful?

#42

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: Peter Eisentraut (#41)

Re: Non-decimal integer literals

On Thu, 24 Nov 2022 at 21:35, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

My code follows the style used for parsing the decimal integers.
Keeping that consistent is valuable I think. I think the proposed
change makes the code significantly harder to understand. Also, what
you are suggesting here would amount to an attempt to make parsing
hexadecimal integers even faster than parsing decimal integers. Is that
useful?

Isn't it being faster one of the major use cases for this feature? I
remember many years ago and several jobs ago when working with SQL
Server being able to speed up importing data using hexadecimal
DATETIMEs. I can't think why else you might want to represent a
DATETIME as a hexstring, so I assumed this was a large part of the use
case for INTs in PostgreSQL. Are you telling me that better
performance is not something anyone will want out of this feature?

David

#43

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: David Rowley (#42)

Re: Non-decimal integer literals

On 24.11.22 10:13, David Rowley wrote:

On Thu, 24 Nov 2022 at 21:35, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

My code follows the style used for parsing the decimal integers.
Keeping that consistent is valuable I think. I think the proposed
change makes the code significantly harder to understand. Also, what
you are suggesting here would amount to an attempt to make parsing
hexadecimal integers even faster than parsing decimal integers. Is that
useful?

Isn't it being faster one of the major use cases for this feature?

Never thought about it that way.

I
remember many years ago and several jobs ago when working with SQL
Server being able to speed up importing data using hexadecimal
DATETIMEs. I can't think why else you might want to represent a
DATETIME as a hexstring, so I assumed this was a large part of the use
case for INTs in PostgreSQL. Are you telling me that better
performance is not something anyone will want out of this feature?

This isn't about datetimes but about integers.

#44

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: Dean Rasheed (#40)

1 attachment(s)

Re: Non-decimal integer literals

On 23.11.22 17:25, Dean Rasheed wrote:

Taking a quick look, I noticed that there are no tests for negative
values handled in the parser.

Giving that a spin shows that make_const() fails to correctly identify
the base of negative non-decimal integers in the T_Float case, causing
it to fall through to numeric_in() and fail:

Fixed in new patch.

Attachments:

v11-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v11-0001-Non-decimal-integer-literals.patchDownload

From 2d7f41981187df904e3d985f2770d9b5c26e9d7c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 28 Nov 2022 09:24:20 +0100
Subject: [PATCH v11] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  34 ++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/parse_node.c            |  37 +++-
 src/backend/parser/scan.l                  | 101 ++++++++---
 src/backend/utils/adt/numutils.c           | 170 ++++++++++++++++--
 src/fe_utils/psqlscan.l                    |  78 +++++++--
 src/interfaces/ecpg/preproc/pgc.l          | 106 ++++++-----
 src/test/regress/expected/int2.out         |  80 +++++++++
 src/test/regress/expected/int4.out         |  80 +++++++++
 src/test/regress/expected/int8.out         |  80 +++++++++
 src/test/regress/expected/numerology.out   | 193 ++++++++++++++++++++-
 src/test/regress/sql/int2.sql              |  22 +++
 src/test/regress/sql/int4.sql              |  22 +++
 src/test/regress/sql/int8.sql              |  22 +++
 src/test/regress/sql/numerology.sql        |  51 +++++-
 16 files changed, 974 insertions(+), 109 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..956182e7c6a8 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
+    <note>
+     <para>
+      Nondecimal integer constants are currently only supported in the range
+      of the <type>bigint</type> type (see <xref
+      linkend="datatype-numeric-table"/>).
+     </para>
+    </note>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 8704a42b60a8..abad216b7ee4 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 4014db4b80f9..d33e3c179df7 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -385,11 +385,46 @@ make_const(ParseState *pstate, A_Const *aconst)
 			{
 				/* could be an oversize integer as well as a float ... */
 
+				int			base = 10;
+				char	   *startptr;
+				int			sign;
+				char	   *testvalue;
 				int64		val64;
 				char	   *endptr;
 
+				startptr = aconst->val.fval.fval;
+				if (startptr[0] == '-')
+				{
+					sign = -1;
+					startptr++;
+				}
+				else
+					sign = +1;
+				if (startptr[0] == '0')
+				{
+					if (startptr[1] == 'b' || startptr[1] == 'B')
+					{
+						base = 2;
+						startptr += 2;
+					}
+					else if (startptr[1] == 'o' || startptr[1] == 'O')
+					{
+						base = 8;
+						startptr += 2;
+					}
+					if (startptr[1] == 'x' || startptr[1] == 'X')
+					{
+						base = 16;
+						startptr += 2;
+					}
+				}
+
+				if (sign == +1)
+					testvalue = startptr;
+				else
+					testvalue = psprintf("-%s", startptr);
 				errno = 0;
-				val64 = strtoi64(aconst->val.fval.fval, &endptr, 10);
+				val64 = strtoi64(testvalue, &endptr, base);
 				if (errno == 0 && *endptr == '\0')
 				{
 					/*
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8ebcc..9ad9e0c8ba74 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,25 +385,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 other			.
 
@@ -983,20 +998,44 @@ other			.
 					yyerror("trailing junk after parameter");
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 2);
+				}
+{hexfail}		{
+					SET_YYLLOC();
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					SET_YYLLOC();
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1007,11 +1046,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1307,17 +1358,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
- * ie digits and a decimal point.
+ * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
+ * thing with {numeric}, ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 834ec0b5882c..2942b7c44904 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -120,13 +131,56 @@ pg_strtoint16(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		int8		digit = (*ptr++ - '0');
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
 
-		if (unlikely(pg_mul_s16_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
+			if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s16_overflow(tmp, 10, &tmp)) ||
+				unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
 	}
 
 	/* allow trailing whitespace, but not other trailing chars */
@@ -196,13 +250,56 @@ pg_strtoint32(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		int8		digit = (*ptr++ - '0');
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
 
-		if (unlikely(pg_mul_s32_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
+			if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s32_overflow(tmp, 10, &tmp)) ||
+				unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
 	}
 
 	/* allow trailing whitespace, but not other trailing chars */
@@ -280,13 +377,56 @@ pg_strtoint64(const char *s)
 		goto invalid_syntax;
 
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		int8		digit = (*ptr++ - '0');
+		ptr += 2;
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			int8		digit = hexlookup[(unsigned char) *ptr];
 
-		if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
-			unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
-			goto out_of_range;
+			if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+
+			ptr++;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
+	}
+	else
+	{
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			int8		digit = (*ptr++ - '0');
+
+			if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
+				unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+				goto out_of_range;
+		}
 	}
 
 	/* allow trailing whitespace, but not other trailing chars */
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec24077..cb1fc5213844 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,25 +323,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -847,13 +862,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -864,10 +897,19 @@ other			.
 {realfail}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index c145c9698f1a..2c09c6cb4f35 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char ychar);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,25 +351,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -399,9 +414,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -414,7 +426,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -932,17 +944,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
-{decimal}		{
+{hexinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 16);
+				}
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -951,22 +966,38 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail}		{
 					/*
 					 * throw back the [Ee][+-], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
 <SQL>{
+{octinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 8);
+				}
+{bininteger}	{
+					return process_integer_literal(yytext, &base_yylval, 2);
+				}
+
 	/*
 	 * Note that some trailing junk is valid in C (such as 100LL), so we
 	 * contain this to SQL mode.
 	 */
-{integer_junk}	{
+{decinteger_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{hexinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{numeric_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
 {real_junk}		{
@@ -1036,19 +1067,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *) yytext, &endptr, 16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1573,17 +1591,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
- * ie digits and a decimal point.
+ * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
+ * thing with {numeric}, ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 109cf9baaaca..37cbd419fa40 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -304,3 +304,83 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
+-- cases near overflow
+SELECT int2 '0b111111111111111';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0b1000000000000000';
+ERROR:  value "0b1000000000000000" is out of range for type smallint
+LINE 1: SELECT int2 '0b1000000000000000';
+                    ^
+SELECT int2 '0o77777';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0o100000';
+ERROR:  value "0o100000" is out of range for type smallint
+LINE 1: SELECT int2 '0o100000';
+                    ^
+SELECT int2 '0x7FFF';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0x8000';
+ERROR:  value "0x8000" is out of range for type smallint
+LINE 1: SELECT int2 '0x8000';
+                    ^
+SELECT int2 '-0b1000000000000000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0b1000000000000001';
+ERROR:  value "-0b1000000000000001" is out of range for type smallint
+LINE 1: SELECT int2 '-0b1000000000000001';
+                    ^
+SELECT int2 '-0o100000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0o100001';
+ERROR:  value "-0o100001" is out of range for type smallint
+LINE 1: SELECT int2 '-0o100001';
+                    ^
+SELECT int2 '-0x8000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0x8001';
+ERROR:  value "-0x8001" is out of range for type smallint
+LINE 1: SELECT int2 '-0x8001';
+                    ^
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index fbcc0e8d9e68..718fa3efc902 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -431,3 +431,83 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
+-- cases near overflow
+SELECT int4 '0b1111111111111111111111111111111';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0b10000000000000000000000000000000';
+ERROR:  value "0b10000000000000000000000000000000" is out of range for type integer
+LINE 1: SELECT int4 '0b10000000000000000000000000000000';
+                    ^
+SELECT int4 '0o17777777777';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0o20000000000';
+ERROR:  value "0o20000000000" is out of range for type integer
+LINE 1: SELECT int4 '0o20000000000';
+                    ^
+SELECT int4 '0x7FFFFFFF';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0x80000000';
+ERROR:  value "0x80000000" is out of range for type integer
+LINE 1: SELECT int4 '0x80000000';
+                    ^
+SELECT int4 '-0b10000000000000000000000000000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0b10000000000000000000000000000001';
+ERROR:  value "-0b10000000000000000000000000000001" is out of range for type integer
+LINE 1: SELECT int4 '-0b10000000000000000000000000000001';
+                    ^
+SELECT int4 '-0o20000000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0o20000000001';
+ERROR:  value "-0o20000000001" is out of range for type integer
+LINE 1: SELECT int4 '-0o20000000001';
+                    ^
+SELECT int4 '-0x80000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0x80000001';
+ERROR:  value "-0x80000001" is out of range for type integer
+LINE 1: SELECT int4 '-0x80000001';
+                    ^
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 1ae23cf3f94f..ab35b53cc4bd 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -927,3 +927,83 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
+-- cases near overflow
+SELECT int8 '0b111111111111111111111111111111111111111111111111111111111111111';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0b1000000000000000000000000000000000000000000000000000000000000000';
+ERROR:  value "0b1000000000000000000000000000000000000000000000000000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0b100000000000000000000000000000000000000000000...
+                    ^
+SELECT int8 '0o777777777777777777777';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0o1000000000000000000000';
+ERROR:  value "0o1000000000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0o1000000000000000000000';
+                    ^
+SELECT int8 '0x7FFFFFFFFFFFFFFF';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0x8000000000000000';
+ERROR:  value "0x8000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0x8000000000000000';
+                    ^
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000001';
+ERROR:  value "-0b1000000000000000000000000000000000000000000000000000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0b10000000000000000000000000000000000000000000...
+                    ^
+SELECT int8 '-0o1000000000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0o1000000000000000000001';
+ERROR:  value "-0o1000000000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0o1000000000000000000001';
+                    ^
+SELECT int8 '-0x8000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0x8000000000000001';
+ERROR:  value "-0x8000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0x8000000000000001';
+                    ^
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 77d48434173b..15cd6b167236 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,167 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- cases near int4 overflow
+SELECT 0b1111111111111111111111111111111;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0b10000000000000000000000000000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT 0o17777777777;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0o20000000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT 0x7FFFFFFF;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0x80000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT -0b10000000000000000000000000000000;
+  ?column?   
+-------------
+ -2147483648
+(1 row)
+
+SELECT -0b10000000000000000000000000000001;
+  ?column?   
+-------------
+ -2147483649
+(1 row)
+
+SELECT -0o20000000000;
+  ?column?   
+-------------
+ -2147483648
+(1 row)
+
+SELECT -0o20000000001;
+  ?column?   
+-------------
+ -2147483649
+(1 row)
+
+SELECT -0x80000000;
+  ?column?   
+-------------
+ -2147483648
+(1 row)
+
+SELECT -0x80000001;
+  ?column?   
+-------------
+ -2147483649
+(1 row)
+
+-- cases near int8 overflow
+SELECT 0b111111111111111111111111111111111111111111111111111111111111111;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0b1000000000000000000000000000000000000000000000000000000000000000;
+ERROR:  invalid input syntax for type numeric: "0b1000000000000000000000000000000000000000000000000000000000000000"
+LINE 1: SELECT 0b100000000000000000000000000000000000000000000000000...
+               ^
+SELECT 0o777777777777777777777;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0o1000000000000000000000;
+ERROR:  invalid input syntax for type numeric: "0o1000000000000000000000"
+LINE 1: SELECT 0o1000000000000000000000;
+               ^
+SELECT 0x7FFFFFFFFFFFFFFF;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0x8000000000000000;
+ERROR:  invalid input syntax for type numeric: "0x8000000000000000"
+LINE 1: SELECT 0x8000000000000000;
+               ^
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000000;
+       ?column?       
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000001;
+ERROR:  invalid input syntax for type numeric: "-0b1000000000000000000000000000000000000000000000000000000000000001"
+LINE 1: SELECT -0b10000000000000000000000000000000000000000000000000...
+               ^
+SELECT -0o1000000000000000000000;
+       ?column?       
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT -0o1000000000000000000001;
+ERROR:  invalid input syntax for type numeric: "-0o1000000000000000000001"
+LINE 1: SELECT -0o1000000000000000000001;
+               ^
+SELECT -0x8000000000000000;
+       ?column?       
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT -0x8000000000000001;
+ERROR:  invalid input syntax for type numeric: "-0x8000000000000001"
+LINE 1: SELECT -0x8000000000000001;
+               ^
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -45,6 +198,42 @@ PREPARE p1 AS SELECT $1a;
 ERROR:  trailing junk after parameter at or near "$1a"
 LINE 1: PREPARE p1 AS SELECT $1a;
                              ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+               ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+               ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+               ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index ea29066b78ee..9809e87d52f2 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -104,3 +104,25 @@
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
+
+-- cases near overflow
+SELECT int2 '0b111111111111111';
+SELECT int2 '0b1000000000000000';
+SELECT int2 '0o77777';
+SELECT int2 '0o100000';
+SELECT int2 '0x7FFF';
+SELECT int2 '0x8000';
+
+SELECT int2 '-0b1000000000000000';
+SELECT int2 '-0b1000000000000001';
+SELECT int2 '-0o100000';
+SELECT int2 '-0o100001';
+SELECT int2 '-0x8000';
+SELECT int2 '-0x8001';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index f19077f3da21..e704dee18a2f 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -164,3 +164,25 @@
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
+
+-- cases near overflow
+SELECT int4 '0b1111111111111111111111111111111';
+SELECT int4 '0b10000000000000000000000000000000';
+SELECT int4 '0o17777777777';
+SELECT int4 '0o20000000000';
+SELECT int4 '0x7FFFFFFF';
+SELECT int4 '0x80000000';
+
+SELECT int4 '-0b10000000000000000000000000000000';
+SELECT int4 '-0b10000000000000000000000000000001';
+SELECT int4 '-0o20000000000';
+SELECT int4 '-0o20000000001';
+SELECT int4 '-0x80000000';
+SELECT int4 '-0x80000001';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 38b771964d79..0a567a81c175 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -245,3 +245,25 @@
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
+
+-- cases near overflow
+SELECT int8 '0b111111111111111111111111111111111111111111111111111111111111111';
+SELECT int8 '0b1000000000000000000000000000000000000000000000000000000000000000';
+SELECT int8 '0o777777777777777777777';
+SELECT int8 '0o1000000000000000000000';
+SELECT int8 '0x7FFFFFFFFFFFFFFF';
+SELECT int8 '0x8000000000000000';
+
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000000';
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000001';
+SELECT int8 '-0o1000000000000000000000';
+SELECT int8 '-0o1000000000000000000001';
+SELECT int8 '-0x8000000000000000';
+SELECT int8 '-0x8000000000000001';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index be7d6dfe0c26..310d9e57663e 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,46 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- cases near int4 overflow
+SELECT 0b1111111111111111111111111111111;
+SELECT 0b10000000000000000000000000000000;
+SELECT 0o17777777777;
+SELECT 0o20000000000;
+SELECT 0x7FFFFFFF;
+SELECT 0x80000000;
+
+SELECT -0b10000000000000000000000000000000;
+SELECT -0b10000000000000000000000000000001;
+SELECT -0o20000000000;
+SELECT -0o20000000001;
+SELECT -0x80000000;
+SELECT -0x80000001;
+
+-- cases near int8 overflow
+SELECT 0b111111111111111111111111111111111111111111111111111111111111111;
+SELECT 0b1000000000000000000000000000000000000000000000000000000000000000;
+SELECT 0o777777777777777777777;
+SELECT 0o1000000000000000000000;
+SELECT 0x7FFFFFFFFFFFFFFF;
+SELECT 0x8000000000000000;
+
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000000;
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000001;
+SELECT -0o1000000000000000000000;
+SELECT -0o1000000000000000000001;
+SELECT -0x8000000000000000;
+SELECT -0x8000000000000001;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -18,6 +54,19 @@
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)

base-commit: cbe6e482d7bf851c6e466697a21dcef7b05cbb59
-- 
2.38.1

#45

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: Peter Eisentraut (#43)

Re: Non-decimal integer literals

On Sat, 26 Nov 2022 at 05:13, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 24.11.22 10:13, David Rowley wrote:

I
remember many years ago and several jobs ago when working with SQL
Server being able to speed up importing data using hexadecimal
DATETIMEs. I can't think why else you might want to represent a
DATETIME as a hexstring, so I assumed this was a large part of the use
case for INTs in PostgreSQL. Are you telling me that better
performance is not something anyone will want out of this feature?

This isn't about datetimes but about integers.

I'm aware. My aim was to show that hex is commonly used as a more
efficient way of getting integer numbers in and out of computers.

Likely it's better for me to quantify this performance increase claim
with some actual performance results.

Here's master (@f0cd57f85) doing copy ab2 from '/tmp/ab.csv';

ab2 is a table with no indexes and just 2 int columns.

16.55% postgres [.] CopyReadLine
7.82% postgres [.] pg_strtoint32
7.60% postgres [.] CopyReadAttributesText
7.06% postgres [.] NextCopyFrom
4.40% postgres [.] CopyFrom

The copy completes in 2512.5278 ms (average time over 10 runs)

Patching master with your v11 patch and copying in hex numbers instead
of decimal numbers shows:

14.39% postgres [.] CopyReadLine
8.60% postgres [.] pg_strtoint32
6.95% postgres [.] NextCopyFrom
6.79% postgres [.] CopyReadAttributesText
4.81% postgres [.] CopyFrom

This shows that we're spending proportionally less time in
CopyReadLine() and proportionally more time in pg_strtoint32(). There
are probably two things going on there, CopyReadLine is likely faster
due to having to read fewer bytes and pg_strtoint32() is likely slower
due to additional branching and code size.

This (copy ab2 from '/tmp/abhex.csv') saw an average time of 2720.1387
ms over 10 runs.

Patching master with your v11 patch +
more_efficient_hex_oct_and_binary_processing.diff

15.68% postgres [.] CopyReadLine
7.75% postgres [.] NextCopyFrom
7.73% postgres [.] pg_strtoint32
6.25% postgres [.] CopyReadAttributesText
4.76% postgres [.] CopyFrom

The average time to import the hex version of the csv file comes down
to 2385.7298 ms over 10 runs.

I didn't run any tests to see how much the performance of importing
the decimal representation slowed down from the v11 patch. I assume
there will be a small performance hit due to the extra processing done
in pg_strtoint32()

David

#46

Dean Rasheed

dean.a.rasheed@gmail.com

about 3 years ago

In reply to: David Rowley (#38)

Re: Non-decimal integer literals

On Wed, 23 Nov 2022 at 08:56, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 23 Nov 2022 at 21:54, David Rowley <dgrowleyml@gmail.com> wrote:

I wonder if you'd be better off with something like:

while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp & UINT64CONST(0xF000000000000000)))
goto out_of_range;

tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}

Here's a delta diff with it changed to work that way.

This isn't correct, because those functions are meant to accumulate a
negative number in "tmp".

The overflow check can't just ignore the final digit either, so I'm
not sure how much this would end up saving once those issues are
fixed.

Regards,
Dean

#47

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: Dean Rasheed (#46)

Re: Non-decimal integer literals

On Tue, 29 Nov 2022 at 23:11, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Wed, 23 Nov 2022 at 08:56, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 23 Nov 2022 at 21:54, David Rowley <dgrowleyml@gmail.com> wrote:

I wonder if you'd be better off with something like:

while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp & UINT64CONST(0xF000000000000000)))
goto out_of_range;

tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}

Here's a delta diff with it changed to work that way.

This isn't correct, because those functions are meant to accumulate a
negative number in "tmp".

Looks like I didn't quite look at that code closely enough.

To make that work we could just form the non-decimal versions in an
unsigned integer of the given size and then check if that's become
greater than -PG_INTXX_MIN after the loop. We'd then just need to
convert it back to its negative form.

i.e:

uint64 tmp2 = 0;
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp2 & UINT64CONST(0xF000000000000000)))
goto out_of_range;

tmp2 = (tmp2 << 4) | hexlookup[(unsigned char) *ptr++];
}

if (tmp2 > -PG_INT64_MIN)
goto out_of_range;
tmp = -((int64) tmp2);

David

#48

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: Peter Eisentraut (#44)

Re: Non-decimal integer literals

On Tue, 29 Nov 2022 at 03:00, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Fixed in new patch.

There seems to be a small bug in the pg_strtointXX functions in the
code that checks that there's at least 1 digit. This causes 0x to be
a valid representation of zero. That does not seem to be allowed by
the parser, so I think we should likely reject it in COPY too.

-- Does not work.
postgres=# select 0x + 1;
ERROR: invalid hexadecimal integer at or near "0x"
LINE 1: select 0x + 1;

postgres=# create table a (a int);
CREATE TABLE

-- probably shouldn't work
postgres=# copy a from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0x
\.

COPY 1

David

#49

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: John Naylor (#39)

Re: Non-decimal integer literals

On Wed, 23 Nov 2022 at 22:19, John Naylor <john.naylor@enterprisedb.com> wrote:

On Wed, Nov 23, 2022 at 3:54 PM David Rowley <dgrowleyml@gmail.com> wrote:

Going by [1], clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc. It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

FWIW, gcc 12.2 generates an imul on my system when compiling in situ.

I spent a bit more time trying to figure out why the compiler does
imul instead of bit shifting and it just seems to be down to a
combination of signed-ness plus the overflow check. See [1]https://godbolt.org/z/EG9jKMjq5. Neither
of the two compilers I tested could use bit shifting with a signed
type when overflow checking is done, which is what we're doing in the
new code.

In clang 15, multiplication is done in both smultiply16 and
umultiply16. These both check for overflow. The versions without the
overflow checks both use bit shifting. With GCC, only smultiply16 does
multiplication. The other 3 variants all use bit shifting.

David

[1]: https://godbolt.org/z/EG9jKMjq5

#50

Dean Rasheed

dean.a.rasheed@gmail.com

about 3 years ago

In reply to: David Rowley (#49)

Re: Non-decimal integer literals

On Wed, 30 Nov 2022 at 05:50, David Rowley <dgrowleyml@gmail.com> wrote:

I spent a bit more time trying to figure out why the compiler does
imul instead of bit shifting and it just seems to be down to a
combination of signed-ness plus the overflow check. See [1]. Neither
of the two compilers I tested could use bit shifting with a signed
type when overflow checking is done, which is what we're doing in the
new code.

Ah, interesting. That makes me think that it might be possible to get
some performance gains for all bases (including 10) by separating the
overflow check from the multiplication, and giving the compiler the
best chance to decide on the optimal way to do the multiplication. For
example, on my Intel box, GCC prefers a pair of LEA instructions over
an IMUL, to multiply by 10.

I like your previous idea of using an unsigned integer for the
accumulator, because then the overflow check in the loop doesn't need
to be exact, as long as an exact check is done later. That way, there
are fewer conditional branches in the loop, and the possibility for
the compiler to choose the fastest multiplication method. So something
like:

// Accumulate positive value using unsigned int, with approximate
// overflow check. If acc >= 1 - INT_MIN / 10, then acc * 10 is
// sure to exceed -INT_MIN.
unsigned int cutoff = 1 - INT_MIN / 10;
unsigned int acc = 0;

while (*ptr && isdigit((unsigned char) *ptr))
{
if (unlikely(acc >= cutoff))
goto out_of_range;
acc = acc * 10 + (*ptr - '0');
ptr++;
}

and similar for other bases, allowing the coding for all bases to be
kept similar.

I think it's probably best to consider this as a follow-on patch
though. It shouldn't delay getting the main feature committed.

Regards,
Dean

#51

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: Dean Rasheed (#50)

Re: Non-decimal integer literals

On Thu, 1 Dec 2022 at 00:34, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

So something
like:

// Accumulate positive value using unsigned int, with approximate
// overflow check. If acc >= 1 - INT_MIN / 10, then acc * 10 is
// sure to exceed -INT_MIN.
unsigned int cutoff = 1 - INT_MIN / 10;
unsigned int acc = 0;

while (*ptr && isdigit((unsigned char) *ptr))
{
if (unlikely(acc >= cutoff))
goto out_of_range;
acc = acc * 10 + (*ptr - '0');
ptr++;
}

and similar for other bases, allowing the coding for all bases to be
kept similar.

Seems like a good idea to me. Couldn't the cutoff check just be "acc >
INT_MAX / 10"?

I think it's probably best to consider this as a follow-on patch
though. It shouldn't delay getting the main feature committed.

I agree that it should be a separate patch. But thinking about what
Tom mentioned in [1]/messages/by-id/3260805.1631106874@sss.pgh.pa.us, I had in mind this patch would need to wait
until the new standard is out so that we have a more genuine reason
for breaking existing queries.

I've drafted up a full patch for improving the current base-10 code,
so I'll go post that on another thread.

David

[1]: /messages/by-id/3260805.1631106874@sss.pgh.pa.us

#52

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: David Rowley (#51)

Re: Non-decimal integer literals

David Rowley <dgrowleyml@gmail.com> writes:

I agree that it should be a separate patch. But thinking about what
Tom mentioned in [1], I had in mind this patch would need to wait
until the new standard is out so that we have a more genuine reason
for breaking existing queries.

Well, we already broke them in v15: that example now gives

regression=# select 0x42e;
ERROR: trailing junk after numeric literal at or near "0x"
LINE 1: select 0x42e;
^

So there's probably no compatibility reason not to drop the
other shoe.

regards, tom lane

#53

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: David Rowley (#48)

1 attachment(s)

Re: Non-decimal integer literals

On 29.11.22 21:22, David Rowley wrote:

There seems to be a small bug in the pg_strtointXX functions in the
code that checks that there's at least 1 digit. This causes 0x to be
a valid representation of zero. That does not seem to be allowed by
the parser, so I think we should likely reject it in COPY too.
-- probably shouldn't work
postgres=# copy a from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0x
\.

COPY 1

Fixed in new patch. I moved the "require at least one digit" checks
after the loops over the digits, to make it easier to write one check
for all bases.

This patch is also incorporates your changes to the digit analysis
algorithm. I didn't check it carefully, but all the tests still pass. ;-)

Attachments:

v12-0001-Non-decimal-integer-literals.patchtext/plain; charset=UTF-8; name=v12-0001-Non-decimal-integer-literals.patchDownload

From 76510f2077d3075653a9bbe899b9d4752953d30e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 8 Dec 2022 12:10:41 +0100
Subject: [PATCH v12] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

    0x42F
    0o273
    0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
---
 doc/src/sgml/syntax.sgml                   |  34 ++++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt       |   1 +
 src/backend/parser/parse_node.c            |  37 +++-
 src/backend/parser/scan.l                  | 101 ++++++++---
 src/backend/utils/adt/numutils.c           | 185 +++++++++++++++++---
 src/fe_utils/psqlscan.l                    |  78 +++++++--
 src/interfaces/ecpg/preproc/pgc.l          | 106 ++++++-----
 src/test/regress/expected/int2.out         |  92 ++++++++++
 src/test/regress/expected/int4.out         |  92 ++++++++++
 src/test/regress/expected/int8.out         |  92 ++++++++++
 src/test/regress/expected/numerology.out   | 193 ++++++++++++++++++++-
 src/test/regress/sql/int2.sql              |  26 +++
 src/test/regress/sql/int4.sql              |  26 +++
 src/test/regress/sql/int8.sql              |  26 +++
 src/test/regress/sql/numerology.sql        |  51 +++++-
 16 files changed, 1028 insertions(+), 118 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f..956182e7c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ <title>Numeric Constants</title>
 </literallayout>
     </para>
 
+    <para>
+     Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+     <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+     (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+     digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+     digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+     upper or lower case.  Note that only integers can have non-decimal forms,
+     not numbers with fractional parts.
+    </para>
+
+    <para>
+     These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+    </para>
+
+    <note>
+     <para>
+      Nondecimal integer constants are currently only supported in the range
+      of the <type>bigint</type> type (see <xref
+      linkend="datatype-numeric-table"/>).
+     </para>
+    </note>
+
     <para>
      <indexterm><primary>integer</primary></indexterm>
      <indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 18725a02d1..95c27a625e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
          WHEN 1700 /*numeric*/ THEN
               CASE WHEN $2 = -1
                    THEN null
-                   ELSE (($2 - 4) >> 16) & 65535
+                   ELSE (($2 - 4) >> 16) & 0xFFFF
                    END
          WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
          WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1700) THEN
             CASE WHEN $2 = -1
                  THEN null
-                 ELSE ($2 - 4) & 65535
+                 ELSE ($2 - 4) & 0xFFFF
                  END
        ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
            THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
        WHEN $1 IN (1186) /* interval */
-           THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END
+           THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 & 0xFFFF END
        ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 8704a42b60..abad216b7e 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652	SQL-dynamic statements in SQL routines			NO
 T653	SQL-schema statements in external routines			YES	
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
+T661	Non-decimal integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 4014db4b80..d33e3c179d 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -385,11 +385,46 @@ make_const(ParseState *pstate, A_Const *aconst)
 			{
 				/* could be an oversize integer as well as a float ... */
 
+				int			base = 10;
+				char	   *startptr;
+				int			sign;
+				char	   *testvalue;
 				int64		val64;
 				char	   *endptr;
 
+				startptr = aconst->val.fval.fval;
+				if (startptr[0] == '-')
+				{
+					sign = -1;
+					startptr++;
+				}
+				else
+					sign = +1;
+				if (startptr[0] == '0')
+				{
+					if (startptr[1] == 'b' || startptr[1] == 'B')
+					{
+						base = 2;
+						startptr += 2;
+					}
+					else if (startptr[1] == 'o' || startptr[1] == 'O')
+					{
+						base = 8;
+						startptr += 2;
+					}
+					if (startptr[1] == 'x' || startptr[1] == 'X')
+					{
+						base = 16;
+						startptr += 2;
+					}
+				}
+
+				if (sign == +1)
+					testvalue = startptr;
+				else
+					testvalue = psprintf("-%s", startptr);
 				errno = 0;
-				val64 = strtoi64(aconst->val.fval.fval, &endptr, 10);
+				val64 = strtoi64(testvalue, &endptr, base);
 				if (errno == 0 && *endptr == '\0')
 				{
 					/*
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8eb..9ad9e0c8ba 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -385,25 +385,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 other			.
 
@@ -983,20 +998,44 @@ other			.
 					yyerror("trailing junk after parameter");
 				}
 
-{integer}		{
+{decinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 10);
+				}
+{hexinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 16);
+				}
+{octinteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 8);
+				}
+{bininteger}	{
+					SET_YYLLOC();
+					return process_integer_literal(yytext, yylval, 2);
+				}
+{hexfail}		{
+					SET_YYLLOC();
+					yyerror("invalid hexadecimal integer");
+				}
+{octfail}		{
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					yyerror("invalid octal integer");
 				}
-{decimal}		{
+{binfail}		{
+					SET_YYLLOC();
+					yyerror("invalid binary integer");
+				}
+{numeric}		{
 					SET_YYLLOC();
 					yylval->str = pstrdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					SET_YYLLOC();
-					return process_integer_literal(yytext, yylval);
+					return process_integer_literal(yytext, yylval, 10);
 				}
 {real}			{
 					SET_YYLLOC();
@@ -1007,11 +1046,23 @@ other			.
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{hexinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					SET_YYLLOC();
+					yyerror("trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					SET_YYLLOC();
 					yyerror("trailing junk after numeric literal");
 				}
@@ -1307,17 +1358,17 @@ litbufdup(core_yyscan_t yyscanner)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
- * ie digits and a decimal point.
+ * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
+ * thing with {numeric}, ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index a64422c8d0..2a37f823de 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
 	return t + (v >= PowersOfTen[t]);
 }
 
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
 /*
  * Convert input string to a signed 16 bit integer.
  *
@@ -99,6 +110,7 @@ int16
 pg_strtoint16(const char *s)
 {
 	const char *ptr = s;
+	const char *firstdigit;
 	uint16		tmp = 0;
 	bool		neg = false;
 
@@ -115,19 +127,60 @@ pg_strtoint16(const char *s)
 	else if (*ptr == '+')
 		ptr++;
 
-	/* require at least one digit */
-	if (unlikely(!isdigit((unsigned char) *ptr)))
-		goto invalid_syntax;
-
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		if (unlikely(tmp > -(PG_INT16_MIN / 10)))
-			goto out_of_range;
+		firstdigit = ptr += 2;
+
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			if (unlikely(tmp > -(PG_INT16_MIN / 16)))
+				goto out_of_range;
+
+			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		firstdigit = ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			if (unlikely(tmp > -(PG_INT16_MIN / 8)))
+				goto out_of_range;
+
+			tmp = tmp * 8 + (*ptr++ - '0');
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		firstdigit = ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			if (unlikely(tmp > -(PG_INT16_MIN / 2)))
+				goto out_of_range;
+
+			tmp = tmp * 2 + (*ptr++ - '0');
+		}
+	}
+	else
+	{
+		firstdigit = ptr;
 
-		tmp = tmp * 10 + (*ptr++ - '0');
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			if (unlikely(tmp > -(PG_INT16_MIN / 10)))
+				goto out_of_range;
+
+			tmp = tmp * 10 + (*ptr++ - '0');
+		}
 	}
 
+	/* require at least one digit */
+	if (unlikely(ptr == firstdigit))
+		goto invalid_syntax;
+
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
 		ptr++;
@@ -177,6 +230,7 @@ int32
 pg_strtoint32(const char *s)
 {
 	const char *ptr = s;
+	const char *firstdigit;
 	uint32		tmp = 0;
 	bool		neg = false;
 
@@ -193,19 +247,60 @@ pg_strtoint32(const char *s)
 	else if (*ptr == '+')
 		ptr++;
 
-	/* require at least one digit */
-	if (unlikely(!isdigit((unsigned char) *ptr)))
-		goto invalid_syntax;
-
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		if (unlikely(tmp > -(PG_INT32_MIN / 10)))
-			goto out_of_range;
+		firstdigit = ptr += 2;
+
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			if (unlikely(tmp > -(PG_INT32_MIN / 16)))
+				goto out_of_range;
+
+			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		firstdigit = ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			if (unlikely(tmp > -(PG_INT32_MIN / 8)))
+				goto out_of_range;
+
+			tmp = tmp * 8 + (*ptr++ - '0');
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		firstdigit = ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			if (unlikely(tmp > -(PG_INT32_MIN / 2)))
+				goto out_of_range;
+
+			tmp = tmp * 2 + (*ptr++ - '0');
+		}
+	}
+	else
+	{
+		firstdigit = ptr;
+
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			if (unlikely(tmp > -(PG_INT32_MIN / 10)))
+				goto out_of_range;
 
-		tmp = tmp * 10 + (*ptr++ - '0');
+			tmp = tmp * 10 + (*ptr++ - '0');
+		}
 	}
 
+	/* require at least one digit */
+	if (unlikely(ptr == firstdigit))
+		goto invalid_syntax;
+
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
 		ptr++;
@@ -255,6 +350,7 @@ int64
 pg_strtoint64(const char *s)
 {
 	const char *ptr = s;
+	const char *firstdigit;
 	uint64		tmp = 0;
 	bool		neg = false;
 
@@ -271,18 +367,59 @@ pg_strtoint64(const char *s)
 	else if (*ptr == '+')
 		ptr++;
 
-	/* require at least one digit */
-	if (unlikely(!isdigit((unsigned char) *ptr)))
-		goto invalid_syntax;
-
 	/* process digits */
-	while (*ptr && isdigit((unsigned char) *ptr))
+	if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
 	{
-		if (unlikely(tmp > -(PG_INT64_MIN / 10)))
-			goto out_of_range;
+		firstdigit = ptr += 2;
+
+		while (*ptr && isxdigit((unsigned char) *ptr))
+		{
+			if (unlikely(tmp > -(PG_INT64_MIN / 16)))
+				goto out_of_range;
 
-		tmp = tmp * 10 + (*ptr++ - '0');
+			tmp = tmp * 16 + hexlookup[(unsigned char) *ptr++];
+		}
 	}
+	else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+	{
+		firstdigit = ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+		{
+			if (unlikely(tmp > -(PG_INT64_MIN / 8)))
+				goto out_of_range;
+
+			tmp = tmp * 8 + (*ptr++ - '0');
+		}
+	}
+	else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+	{
+		firstdigit = ptr += 2;
+
+		while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+		{
+			if (unlikely(tmp > -(PG_INT64_MIN / 2)))
+				goto out_of_range;
+
+			tmp = tmp * 2 + (*ptr++ - '0');
+		}
+	}
+	else
+	{
+		firstdigit = ptr;
+
+		while (*ptr && isdigit((unsigned char) *ptr))
+		{
+			if (unlikely(tmp > -(PG_INT64_MIN / 10)))
+				goto out_of_range;
+
+			tmp = tmp * 10 + (*ptr++ - '0');
+		}
+	}
+
+	/* require at least one digit */
+	if (unlikely(ptr == firstdigit))
+		goto invalid_syntax;
 
 	/* allow trailing whitespace, but not other trailing chars */
 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec240..cb1fc52138 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,25 +323,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* psql-specific: characters allowed in variable names */
 variable_char	[A-Za-z\200-\377_0-9]
@@ -847,13 +862,31 @@ other			.
 					ECHO;
 				}
 
-{integer}		{
+{decinteger}	{
+					ECHO;
+				}
+{hexinteger}	{
+					ECHO;
+				}
+{octinteger}	{
+					ECHO;
+				}
+{bininteger}	{
+					ECHO;
+				}
+{hexfail}		{
 					ECHO;
 				}
-{decimal}		{
+{octfail}		{
 					ECHO;
 				}
-{decimalfail}	{
+{binfail}		{
+					ECHO;
+				}
+{numeric}		{
+					ECHO;
+				}
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
 					ECHO;
@@ -864,10 +897,19 @@ other			.
 {realfail}		{
 					ECHO;
 				}
-{integer_junk}	{
+{decinteger_junk}	{
+					ECHO;
+				}
+{hexinteger_junk}	{
+					ECHO;
+				}
+{octinteger_junk}	{
+					ECHO;
+				}
+{bininteger_junk}	{
 					ECHO;
 				}
-{decimal_junk}	{
+{numeric_junk}	{
 					ECHO;
 				}
 {real_junk}		{
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index c145c9698f..2c09c6cb4f 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool		include_next;
 #define startlit()	(literalbuf[0] = '\0', literallen = 0)
 static void addlit(char *ytext, int yleng);
 static void addlitchar(unsigned char ychar);
-static int	process_integer_literal(const char *token, YYSTYPE *lval);
+static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
 static void parse_include(void);
 static bool ecpg_isspace(char ch);
 static bool isdefine(void);
@@ -351,25 +351,40 @@ operator		{op_chars}+
  * Unary minus is not part of a number here.  Instead we pass it separately to
  * the parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 10.
  *
  * {realfail} is added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
-digit			[0-9]
-
-integer			{digit}+
-decimal			(({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail		{digit}+\.\.
-real			({integer}|{decimal})[Ee][-+]?{digit}+
-realfail		({integer}|{decimal})[Ee][-+]
-
-integer_junk	{integer}{ident_start}
-decimal_junk	{decimal}{ident_start}
+decdigit		[0-9]
+hexdigit		[0-9A-Fa-f]
+octdigit		[0-7]
+bindigit		[0-1]
+
+decinteger		{decdigit}+
+hexinteger		0[xX]{hexdigit}+
+octinteger		0[oO]{octdigit}+
+bininteger		0[bB]{bindigit}+
+
+hexfail			0[xX]
+octfail			0[oO]
+binfail			0[bB]
+
+numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail		{decdigit}+\.\.
+
+real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail		({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk	{decinteger}{ident_start}
+hexinteger_junk	{hexinteger}{ident_start}
+octinteger_junk	{octinteger}{ident_start}
+bininteger_junk	{bininteger}{ident_start}
+numeric_junk	{numeric}{ident_start}
 real_junk		{real}{ident_start}
 
-param			\${integer}
-param_junk		\${integer}{ident_start}
+param			\${decinteger}
+param_junk		\${decinteger}{ident_start}
 
 /* special characters for other dbms */
 /* we have to react differently in compat mode */
@@ -399,9 +414,6 @@ include_next	[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
 import			[iI][mM][pP][oO][rR][tT]
 undef			[uU][nN][dD][eE][fF]
 
-/* C version of hex number */
-xch				0[xX][0-9A-Fa-f]*
-
 ccomment		"//".*\n
 
 if				[iI][fF]
@@ -414,7 +426,7 @@ endif			[eE][nN][dD][iI][fF]
 struct			[sS][tT][rR][uU][cC][tT]
 
 exec_sql		{exec}{space}*{sql}{space}*
-ipdigit			({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit			({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
 ip				{ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
 
 /* we might want to parse all cpp include files */
@@ -932,17 +944,20 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 }  /* <SQL> */
 
 <C,SQL>{
-{integer}		{
-					return process_integer_literal(yytext, &base_yylval);
+{decinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
-{decimal}		{
+{hexinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 16);
+				}
+{numeric}		{
 					base_yylval.str = mm_strdup(yytext);
 					return FCONST;
 				}
-{decimalfail}	{
+{numericfail}	{
 					/* throw back the .., and treat as integer */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 {real}			{
 					base_yylval.str = mm_strdup(yytext);
@@ -951,22 +966,38 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 {realfail}		{
 					/*
 					 * throw back the [Ee][+-], and figure out whether what
-					 * remains is an {integer} or {decimal}.
+					 * remains is an {decinteger} or {numeric}.
 					 */
 					yyless(yyleng - 2);
-					return process_integer_literal(yytext, &base_yylval);
+					return process_integer_literal(yytext, &base_yylval, 10);
 				}
 } /* <C,SQL> */
 
 <SQL>{
+{octinteger}	{
+					return process_integer_literal(yytext, &base_yylval, 8);
+				}
+{bininteger}	{
+					return process_integer_literal(yytext, &base_yylval, 2);
+				}
+
 	/*
 	 * Note that some trailing junk is valid in C (such as 100LL), so we
 	 * contain this to SQL mode.
 	 */
-{integer_junk}	{
+{decinteger_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
-{decimal_junk}	{
+{hexinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{octinteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{bininteger_junk}	{
+					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
+				}
+{numeric_junk}	{
 					mmfatal(PARSE_ERROR, "trailing junk after numeric literal");
 				}
 {real_junk}		{
@@ -1036,19 +1067,6 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 							return S_ANYTHING;
 					 }
 <C>{ccomment}		{ ECHO; }
-<C>{xch}			{
-						char* endptr;
-
-						errno = 0;
-						base_yylval.ival = strtoul((char *) yytext, &endptr, 16);
-						if (*endptr != '\0' || errno == ERANGE)
-						{
-							errno = 0;
-							base_yylval.str = mm_strdup(yytext);
-							return SCONST;
-						}
-						return ICONST;
-					}
 <C>{cppinclude}		{
 						if (system_includes)
 						{
@@ -1573,17 +1591,17 @@ addlitchar(unsigned char ychar)
 }
 
 /*
- * Process {integer}.  Note this will also do the right thing with {decimal},
- * ie digits and a decimal point.
+ * Process {decinteger}, {hexinteger}, etc.  Note this will also do the right
+ * thing with {numeric}, ie digits and a decimal point.
  */
 static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
 	int			val;
 	char	   *endptr;
 
 	errno = 0;
-	val = strtoint(token, &endptr, 10);
+	val = strtoint(base == 10 ? token : token + 2, &endptr, base);
 	if (*endptr != '\0' || errno == ERANGE)
 	{
 		/* integer too large (or contains decimal pt), treat it as a float */
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 109cf9baaa..87fb369d76 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -304,3 +304,95 @@ FROM (VALUES (-2.5::numeric),
   2.5 |          3
 (7 rows)
 
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2 
+------
+   37
+(1 row)
+
+SELECT int2 '0o273';
+ int2 
+------
+  187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2 
+------
+ 1071
+(1 row)
+
+SELECT int2 '0b';
+ERROR:  invalid input syntax for type smallint: "0b"
+LINE 1: SELECT int2 '0b';
+                    ^
+SELECT int2 '0o';
+ERROR:  invalid input syntax for type smallint: "0o"
+LINE 1: SELECT int2 '0o';
+                    ^
+SELECT int2 '0x';
+ERROR:  invalid input syntax for type smallint: "0x"
+LINE 1: SELECT int2 '0x';
+                    ^
+-- cases near overflow
+SELECT int2 '0b111111111111111';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0b1000000000000000';
+ERROR:  value "0b1000000000000000" is out of range for type smallint
+LINE 1: SELECT int2 '0b1000000000000000';
+                    ^
+SELECT int2 '0o77777';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0o100000';
+ERROR:  value "0o100000" is out of range for type smallint
+LINE 1: SELECT int2 '0o100000';
+                    ^
+SELECT int2 '0x7FFF';
+ int2  
+-------
+ 32767
+(1 row)
+
+SELECT int2 '0x8000';
+ERROR:  value "0x8000" is out of range for type smallint
+LINE 1: SELECT int2 '0x8000';
+                    ^
+SELECT int2 '-0b1000000000000000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0b1000000000000001';
+ERROR:  value "-0b1000000000000001" is out of range for type smallint
+LINE 1: SELECT int2 '-0b1000000000000001';
+                    ^
+SELECT int2 '-0o100000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0o100001';
+ERROR:  value "-0o100001" is out of range for type smallint
+LINE 1: SELECT int2 '-0o100001';
+                    ^
+SELECT int2 '-0x8000';
+  int2  
+--------
+ -32768
+(1 row)
+
+SELECT int2 '-0x8001';
+ERROR:  value "-0x8001" is out of range for type smallint
+LINE 1: SELECT int2 '-0x8001';
+                    ^
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index fbcc0e8d9e..e05038a4a4 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -431,3 +431,95 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 ERROR:  integer out of range
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
 ERROR:  integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4 
+------
+   37
+(1 row)
+
+SELECT int4 '0o273';
+ int4 
+------
+  187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4 
+------
+ 1071
+(1 row)
+
+SELECT int4 '0b';
+ERROR:  invalid input syntax for type integer: "0b"
+LINE 1: SELECT int4 '0b';
+                    ^
+SELECT int4 '0o';
+ERROR:  invalid input syntax for type integer: "0o"
+LINE 1: SELECT int4 '0o';
+                    ^
+SELECT int4 '0x';
+ERROR:  invalid input syntax for type integer: "0x"
+LINE 1: SELECT int4 '0x';
+                    ^
+-- cases near overflow
+SELECT int4 '0b1111111111111111111111111111111';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0b10000000000000000000000000000000';
+ERROR:  value "0b10000000000000000000000000000000" is out of range for type integer
+LINE 1: SELECT int4 '0b10000000000000000000000000000000';
+                    ^
+SELECT int4 '0o17777777777';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0o20000000000';
+ERROR:  value "0o20000000000" is out of range for type integer
+LINE 1: SELECT int4 '0o20000000000';
+                    ^
+SELECT int4 '0x7FFFFFFF';
+    int4    
+------------
+ 2147483647
+(1 row)
+
+SELECT int4 '0x80000000';
+ERROR:  value "0x80000000" is out of range for type integer
+LINE 1: SELECT int4 '0x80000000';
+                    ^
+SELECT int4 '-0b10000000000000000000000000000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0b10000000000000000000000000000001';
+ERROR:  value "-0b10000000000000000000000000000001" is out of range for type integer
+LINE 1: SELECT int4 '-0b10000000000000000000000000000001';
+                    ^
+SELECT int4 '-0o20000000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0o20000000001';
+ERROR:  value "-0o20000000001" is out of range for type integer
+LINE 1: SELECT int4 '-0o20000000001';
+                    ^
+SELECT int4 '-0x80000000';
+    int4     
+-------------
+ -2147483648
+(1 row)
+
+SELECT int4 '-0x80000001';
+ERROR:  value "-0x80000001" is out of range for type integer
+LINE 1: SELECT int4 '-0x80000001';
+                    ^
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 1ae23cf3f9..244cef48f5 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -927,3 +927,95 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 ERROR:  bigint out of range
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
 ERROR:  bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8 
+------
+   37
+(1 row)
+
+SELECT int8 '0o273';
+ int8 
+------
+  187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8 
+------
+ 1071
+(1 row)
+
+SELECT int8 '0b';
+ERROR:  invalid input syntax for type bigint: "0b"
+LINE 1: SELECT int8 '0b';
+                    ^
+SELECT int8 '0o';
+ERROR:  invalid input syntax for type bigint: "0o"
+LINE 1: SELECT int8 '0o';
+                    ^
+SELECT int8 '0x';
+ERROR:  invalid input syntax for type bigint: "0x"
+LINE 1: SELECT int8 '0x';
+                    ^
+-- cases near overflow
+SELECT int8 '0b111111111111111111111111111111111111111111111111111111111111111';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0b1000000000000000000000000000000000000000000000000000000000000000';
+ERROR:  value "0b1000000000000000000000000000000000000000000000000000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0b100000000000000000000000000000000000000000000...
+                    ^
+SELECT int8 '0o777777777777777777777';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0o1000000000000000000000';
+ERROR:  value "0o1000000000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0o1000000000000000000000';
+                    ^
+SELECT int8 '0x7FFFFFFFFFFFFFFF';
+        int8         
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT int8 '0x8000000000000000';
+ERROR:  value "0x8000000000000000" is out of range for type bigint
+LINE 1: SELECT int8 '0x8000000000000000';
+                    ^
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000001';
+ERROR:  value "-0b1000000000000000000000000000000000000000000000000000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0b10000000000000000000000000000000000000000000...
+                    ^
+SELECT int8 '-0o1000000000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0o1000000000000000000001';
+ERROR:  value "-0o1000000000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0o1000000000000000000001';
+                    ^
+SELECT int8 '-0x8000000000000000';
+         int8         
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT int8 '-0x8000000000000001';
+ERROR:  value "-0x8000000000000001" is out of range for type bigint
+LINE 1: SELECT int8 '-0x8000000000000001';
+                    ^
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 77d4843417..15cd6b1672 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,167 @@
 -- Test various combinations of numeric types and functions.
 --
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
+SELECT 0b100101;
+ ?column? 
+----------
+       37
+(1 row)
+
+SELECT 0o273;
+ ?column? 
+----------
+      187
+(1 row)
+
+SELECT 0x42F;
+ ?column? 
+----------
+     1071
+(1 row)
+
+-- cases near int4 overflow
+SELECT 0b1111111111111111111111111111111;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0b10000000000000000000000000000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT 0o17777777777;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0o20000000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT 0x7FFFFFFF;
+  ?column?  
+------------
+ 2147483647
+(1 row)
+
+SELECT 0x80000000;
+  ?column?  
+------------
+ 2147483648
+(1 row)
+
+SELECT -0b10000000000000000000000000000000;
+  ?column?   
+-------------
+ -2147483648
+(1 row)
+
+SELECT -0b10000000000000000000000000000001;
+  ?column?   
+-------------
+ -2147483649
+(1 row)
+
+SELECT -0o20000000000;
+  ?column?   
+-------------
+ -2147483648
+(1 row)
+
+SELECT -0o20000000001;
+  ?column?   
+-------------
+ -2147483649
+(1 row)
+
+SELECT -0x80000000;
+  ?column?   
+-------------
+ -2147483648
+(1 row)
+
+SELECT -0x80000001;
+  ?column?   
+-------------
+ -2147483649
+(1 row)
+
+-- cases near int8 overflow
+SELECT 0b111111111111111111111111111111111111111111111111111111111111111;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0b1000000000000000000000000000000000000000000000000000000000000000;
+ERROR:  invalid input syntax for type numeric: "0b1000000000000000000000000000000000000000000000000000000000000000"
+LINE 1: SELECT 0b100000000000000000000000000000000000000000000000000...
+               ^
+SELECT 0o777777777777777777777;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0o1000000000000000000000;
+ERROR:  invalid input syntax for type numeric: "0o1000000000000000000000"
+LINE 1: SELECT 0o1000000000000000000000;
+               ^
+SELECT 0x7FFFFFFFFFFFFFFF;
+      ?column?       
+---------------------
+ 9223372036854775807
+(1 row)
+
+SELECT 0x8000000000000000;
+ERROR:  invalid input syntax for type numeric: "0x8000000000000000"
+LINE 1: SELECT 0x8000000000000000;
+               ^
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000000;
+       ?column?       
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000001;
+ERROR:  invalid input syntax for type numeric: "-0b1000000000000000000000000000000000000000000000000000000000000001"
+LINE 1: SELECT -0b10000000000000000000000000000000000000000000000000...
+               ^
+SELECT -0o1000000000000000000000;
+       ?column?       
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT -0o1000000000000000000001;
+ERROR:  invalid input syntax for type numeric: "-0o1000000000000000000001"
+LINE 1: SELECT -0o1000000000000000000001;
+               ^
+SELECT -0x8000000000000000;
+       ?column?       
+----------------------
+ -9223372036854775808
+(1 row)
+
+SELECT -0x8000000000000001;
+ERROR:  invalid input syntax for type numeric: "-0x8000000000000001"
+LINE 1: SELECT -0x8000000000000001;
+               ^
+-- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
 LINE 1: SELECT 123abc;
                ^
 SELECT 0x0o;
-ERROR:  trailing junk after numeric literal at or near "0x"
+ERROR:  trailing junk after numeric literal at or near "0x0o"
 LINE 1: SELECT 0x0o;
                ^
 SELECT 1_2_3;
@@ -45,6 +198,42 @@ PREPARE p1 AS SELECT $1a;
 ERROR:  trailing junk after parameter at or near "$1a"
 LINE 1: PREPARE p1 AS SELECT $1a;
                              ^
+SELECT 0b;
+ERROR:  invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+               ^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+               ^
+SELECT 0b0x;
+ERROR:  trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+               ^
+SELECT 0o;
+ERROR:  invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+               ^
+SELECT 1o;
+ERROR:  trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+               ^
+SELECT 0o0x;
+ERROR:  trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+               ^
+SELECT 0x;
+ERROR:  invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+               ^
+SELECT 1x;
+ERROR:  trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+               ^
+SELECT 0x0y;
+ERROR:  trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+               ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index ea29066b78..c97f4830e1 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -104,3 +104,29 @@
              (0.5::numeric),
              (1.5::numeric),
              (2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
+
+SELECT int2 '0b';
+SELECT int2 '0o';
+SELECT int2 '0x';
+
+-- cases near overflow
+SELECT int2 '0b111111111111111';
+SELECT int2 '0b1000000000000000';
+SELECT int2 '0o77777';
+SELECT int2 '0o100000';
+SELECT int2 '0x7FFF';
+SELECT int2 '0x8000';
+
+SELECT int2 '-0b1000000000000000';
+SELECT int2 '-0b1000000000000001';
+SELECT int2 '-0o100000';
+SELECT int2 '-0o100001';
+SELECT int2 '-0x8000';
+SELECT int2 '-0x8001';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index f19077f3da..8dfaaac88a 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -164,3 +164,29 @@
 
 SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
 SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
+
+SELECT int4 '0b';
+SELECT int4 '0o';
+SELECT int4 '0x';
+
+-- cases near overflow
+SELECT int4 '0b1111111111111111111111111111111';
+SELECT int4 '0b10000000000000000000000000000000';
+SELECT int4 '0o17777777777';
+SELECT int4 '0o20000000000';
+SELECT int4 '0x7FFFFFFF';
+SELECT int4 '0x80000000';
+
+SELECT int4 '-0b10000000000000000000000000000000';
+SELECT int4 '-0b10000000000000000000000000000001';
+SELECT int4 '-0o20000000000';
+SELECT int4 '-0o20000000001';
+SELECT int4 '-0x80000000';
+SELECT int4 '-0x80000001';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 38b771964d..d6e6d770d1 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -245,3 +245,29 @@
 
 SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
 SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
+
+SELECT int8 '0b';
+SELECT int8 '0o';
+SELECT int8 '0x';
+
+-- cases near overflow
+SELECT int8 '0b111111111111111111111111111111111111111111111111111111111111111';
+SELECT int8 '0b1000000000000000000000000000000000000000000000000000000000000000';
+SELECT int8 '0o777777777777777777777';
+SELECT int8 '0o1000000000000000000000';
+SELECT int8 '0x7FFFFFFFFFFFFFFF';
+SELECT int8 '0x8000000000000000';
+
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000000';
+SELECT int8 '-0b1000000000000000000000000000000000000000000000000000000000000001';
+SELECT int8 '-0o1000000000000000000000';
+SELECT int8 '-0o1000000000000000000001';
+SELECT int8 '-0x8000000000000000';
+SELECT int8 '-0x8000000000000001';
diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql
index be7d6dfe0c..310d9e5766 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,46 @@
 -- Test various combinations of numeric types and functions.
 --
 
+
 --
--- Trailing junk in numeric literals
+-- numeric literals
 --
 
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- cases near int4 overflow
+SELECT 0b1111111111111111111111111111111;
+SELECT 0b10000000000000000000000000000000;
+SELECT 0o17777777777;
+SELECT 0o20000000000;
+SELECT 0x7FFFFFFF;
+SELECT 0x80000000;
+
+SELECT -0b10000000000000000000000000000000;
+SELECT -0b10000000000000000000000000000001;
+SELECT -0o20000000000;
+SELECT -0o20000000001;
+SELECT -0x80000000;
+SELECT -0x80000001;
+
+-- cases near int8 overflow
+SELECT 0b111111111111111111111111111111111111111111111111111111111111111;
+SELECT 0b1000000000000000000000000000000000000000000000000000000000000000;
+SELECT 0o777777777777777777777;
+SELECT 0o1000000000000000000000;
+SELECT 0x7FFFFFFFFFFFFFFF;
+SELECT 0x8000000000000000;
+
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000000;
+SELECT -0b1000000000000000000000000000000000000000000000000000000000000001;
+SELECT -0o1000000000000000000000;
+SELECT -0o1000000000000000000001;
+SELECT -0x8000000000000000;
+SELECT -0x8000000000000001;
+
+-- error cases
 SELECT 123abc;
 SELECT 0x0o;
 SELECT 1_2_3;
@@ -18,6 +54,19 @@
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
 
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)

base-commit: 4b3e37993254ed098219e62ceffb1b32fac388cb
-- 
2.38.1

#54

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: Peter Eisentraut (#53)

Re: Non-decimal integer literals

On 08.12.22 12:16, Peter Eisentraut wrote:

On 29.11.22 21:22, David Rowley wrote:

There seems to be a small bug in the pg_strtointXX functions in the
code that checks that there's at least 1 digit. This causes 0x to be
a valid representation of zero. That does not seem to be allowed by
the parser, so I think we should likely reject it in COPY too.
-- probably shouldn't work
postgres=# copy a from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0x
\.

COPY 1

Fixed in new patch. I moved the "require at least one digit" checks
after the loops over the digits, to make it easier to write one check
for all bases.

This patch is also incorporates your changes to the digit analysis
algorithm. I didn't check it carefully, but all the tests still pass. ;-)

committed

#55

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Peter Eisentraut (#54)

2 attachment(s)

Re: Non-decimal integer literals

On Wed, 14 Dec 2022 at 05:47, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

committed

Now that we have this for integer types, I think it's worth doing for
numeric as well, since the parser will now pass such things through to
numeric_in() when they don't fit in an int64, and it seems plausible
that at least some people might use non-decimal integers beyond
INT64MIN/MAX. Also, without such support in numeric_in(), the feature
looks a little incomplete:

SELECT -0x8000000000000000;
?column?
----------------------
-9223372036854775808
(1 row)

SELECT 0x8000000000000000;
ERROR: invalid input syntax for type numeric: "0x8000000000000000"
LINE 1: select 0x8000000000000000;
^

One concern I had was what the performance would be like. I don't
really expect people to pass in the kinds of truly huge values that
numeric supports, but it can't be ruled out. So I gave it a go, to see
how hard it would be, and what the worst-case performance looks like.
(I included underscore-handling too, so that I could measure that at
the same time.)

The base-conversion algorithm is O(N^2), and the worst case before
overflow is with hex strings with around 108,000 digits, oct strings
with around 145,000 digits, or binary strings with around 435,000
digits. Each of those takes around 400ms to parse on my machine.
That's around the level at which I might consider adding
CHECK_FOR_INTERRUPTS()'s, but I think that it's probably not worth it,
given how unrealistic such huge inputs are in practice.

The other important thing is that this shouldn't impact the
performance when parsing regular decimal inputs. The bulk of the
non-decimal integer parsing is handled by a separate function, which
is called directly from numeric_in(), since non-decimal handling isn't
required at the set_var_from_str() level (used by the float4/8 ->
numeric conversion functions). I also re-arranged the numeric_in()
code somewhat, and was able to make substantial savings by reducing
the number of pg_strncasecmp() calls, and avoiding those calls
entirely for regular numbers that aren't NaN or Inf. Testing that with
COPY with a few million numbers of different sizes, I observed a
10-15% performance increase.

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.

Regards,
Dean

Attachments:

0001-Add-non-decimal-integer-support-to-type-numeric.patchtext/x-patch; charset=US-ASCII; name=0001-Add-non-decimal-integer-support-to-type-numeric.patchDownload

From f129bcdaeaaa62d8ddaf6a8e6441183f46097687 Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Fri, 13 Jan 2023 09:20:17 +0000
Subject: [PATCH 1/2] Add non-decimal integer support to type numeric.

This enhances the numeric type input function, adding support for
hexadecimal, octal, and binary integers of any size, up to the limits
of the numeric type.

Since 6fcda9aba8, such non-decimal integers have been accepted by the
parser as integer literals and passed through to numeric_in(). This
commit gives numeric_in() the ability to handle them.

While at it, simplify the handling of NaN and infinities, reducing the
number of calls to pg_strncasecmp(), and arrange for pg_strncasecmp()
to not be called at all for regular numbers. This gives a significant
performance improvement for decimal inputs, more than offsetting the
small performance hit of checking for non-decimal input.
---
 src/backend/utils/adt/numeric.c          | 355 +++++++++++++++++++----
 src/test/regress/expected/numeric.out    |  62 +++-
 src/test/regress/expected/numerology.out |  48 +--
 src/test/regress/sql/numeric.sql         |  10 +
 4 files changed, 380 insertions(+), 95 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index a6409ecbee..ed592841dc 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -500,6 +500,11 @@ static void zero_var(NumericVar *var);
 static bool set_var_from_str(const char *str, const char *cp,
 							 NumericVar *dest, const char **endptr,
 							 Node *escontext);
+static bool set_var_from_non_decimal_integer_str(const char *str,
+												 const char *cp, int sign,
+												 int base, NumericVar *dest,
+												 const char **endptr,
+												 Node *escontext);
 static void set_var_from_num(Numeric num, NumericVar *dest);
 static void init_var_from_num(Numeric num, NumericVar *dest);
 static void set_var_from_var(const NumericVar *value, NumericVar *dest);
@@ -625,6 +630,8 @@ numeric_in(PG_FUNCTION_ARGS)
 	Node	   *escontext = fcinfo->context;
 	Numeric		res;
 	const char *cp;
+	const char *numstart;
+	int			sign;
 
 	/* Skip leading spaces */
 	cp = str;
@@ -636,70 +643,130 @@ numeric_in(PG_FUNCTION_ARGS)
 	}
 
 	/*
-	 * Check for NaN and infinities.  We recognize the same strings allowed by
-	 * float8in().
+	 * Process the number's sign. This duplicates logic in set_var_from_str(),
+	 * but it's worth doing here, since it simplifies the handling of
+	 * infinities and non-decimal integers.
 	 */
-	if (pg_strncasecmp(cp, "NaN", 3) == 0)
-	{
-		res = make_result(&const_nan);
-		cp += 3;
-	}
-	else if (pg_strncasecmp(cp, "Infinity", 8) == 0)
-	{
-		res = make_result(&const_pinf);
-		cp += 8;
-	}
-	else if (pg_strncasecmp(cp, "+Infinity", 9) == 0)
-	{
-		res = make_result(&const_pinf);
-		cp += 9;
-	}
-	else if (pg_strncasecmp(cp, "-Infinity", 9) == 0)
-	{
-		res = make_result(&const_ninf);
-		cp += 9;
-	}
-	else if (pg_strncasecmp(cp, "inf", 3) == 0)
-	{
-		res = make_result(&const_pinf);
-		cp += 3;
-	}
-	else if (pg_strncasecmp(cp, "+inf", 4) == 0)
+	numstart = cp;
+	sign = NUMERIC_POS;
+
+	if (*cp == '+')
+		cp++;
+	else if (*cp == '-')
 	{
-		res = make_result(&const_pinf);
-		cp += 4;
+		sign = NUMERIC_NEG;
+		cp++;
 	}
-	else if (pg_strncasecmp(cp, "-inf", 4) == 0)
+
+	/*
+	 * Check for NaN and infinities.  We recognize the same strings allowed by
+	 * float8in().
+	 *
+	 * Since all other legal inputs have a digit or a decimal point after the
+	 * sign, we need only check for NaN/infinity if that's not the case.
+	 */
+	if (!isdigit((unsigned char) *cp) && *cp != '.')
 	{
-		res = make_result(&const_ninf);
-		cp += 4;
+		/*
+		 * The number must be NaN or infinity; anything else can only be a
+		 * syntax error. Note that NaN mustn't have a sign.
+		 */
+		if (pg_strncasecmp(numstart, "NaN", 3) == 0)
+		{
+			res = make_result(&const_nan);
+			cp = numstart + 3;
+		}
+		else if (pg_strncasecmp(cp, "Infinity", 8) == 0)
+		{
+			res = make_result(sign == NUMERIC_POS ? &const_pinf : &const_ninf);
+			cp += 8;
+		}
+		else if (pg_strncasecmp(cp, "inf", 3) == 0)
+		{
+			res = make_result(sign == NUMERIC_POS ? &const_pinf : &const_ninf);
+			cp += 3;
+		}
+		else
+			goto invalid_syntax;
+
+		/*
+		 * Check for trailing junk; there should be nothing left but spaces.
+		 *
+		 * We intentionally do this check before applying the typmod because
+		 * we would like to throw any trailing-junk syntax error before any
+		 * semantic error resulting from apply_typmod_special().
+		 */
+		while (*cp)
+		{
+			if (!isspace((unsigned char) *cp))
+				goto invalid_syntax;
+			cp++;
+		}
+
+		if (!apply_typmod_special(res, typmod, escontext))
+			PG_RETURN_NULL();
 	}
 	else
 	{
 		/*
-		 * Use set_var_from_str() to parse a normal numeric value
+		 * We have a normal numeric value, which may be a non-decimal integer
+		 * or a regular decimal number.
 		 */
 		NumericVar	value;
+		int			base;
 		bool		have_error;
 
 		init_var(&value);
 
-		if (!set_var_from_str(str, cp, &value, &cp, escontext))
-			PG_RETURN_NULL();
+		/*
+		 * Determine the number's base by looking for a non-decimal prefix
+		 * indicator ("0x", "0o", or "0b").
+		 */
+		if (cp[0] == '0')
+		{
+			switch (cp[1])
+			{
+				case 'x':
+				case 'X':
+					base = 16;
+					break;
+				case 'o':
+				case 'O':
+					base = 8;
+					break;
+				case 'b':
+				case 'B':
+					base = 2;
+					break;
+				default:
+					base = 10;
+			}
+		}
+		else
+			base = 10;
+
+		/* Parse the rest of the number and apply the sign */
+		if (base == 10)
+		{
+			if (!set_var_from_str(str, cp, &value, &cp, escontext))
+				PG_RETURN_NULL();
+			value.sign = sign;
+		}
+		else
+		{
+			if (!set_var_from_non_decimal_integer_str(str, cp + 2, sign, base,
+													  &value, &cp, escontext))
+				PG_RETURN_NULL();
+		}
 
 		/*
-		 * We duplicate a few lines of code here because we would like to
-		 * throw any trailing-junk syntax error before any semantic error
-		 * resulting from apply_typmod.  We can't easily fold the two cases
-		 * together because we mustn't apply apply_typmod to a NaN/Inf.
+		 * Should be nothing left but spaces. As above, throw any typmod error
+		 * after finishing syntax check.
 		 */
 		while (*cp)
 		{
 			if (!isspace((unsigned char) *cp))
-				ereturn(escontext, (Datum) 0,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s: \"%s\"",
-								"numeric", str)));
+				goto invalid_syntax;
 			cp++;
 		}
 
@@ -714,26 +781,15 @@ numeric_in(PG_FUNCTION_ARGS)
 					 errmsg("value overflows numeric format")));
 
 		free_var(&value);
-
-		PG_RETURN_NUMERIC(res);
-	}
-
-	/* Should be nothing left but spaces */
-	while (*cp)
-	{
-		if (!isspace((unsigned char) *cp))
-			ereturn(escontext, (Datum) 0,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s: \"%s\"",
-							"numeric", str)));
-		cp++;
 	}
 
-	/* As above, throw any typmod error after finishing syntax check */
-	if (!apply_typmod_special(res, typmod, escontext))
-		PG_RETURN_NULL();
-
 	PG_RETURN_NUMERIC(res);
+
+invalid_syntax:
+	ereturn(escontext, (Datum) 0,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"numeric", str)));
 }
 
 
@@ -6988,6 +7044,183 @@ set_var_from_str(const char *str, const char *cp,
 }
 
 
+/*
+ * Return the numeric value of a single hex digit.
+ */
+static inline int
+xdigit_value(char dig)
+{
+	return dig >= '0' && dig <= '9' ? dig - '0' :
+		dig >= 'a' && dig <= 'f' ? dig - 'a' + 10 :
+		dig >= 'A' && dig <= 'F' ? dig - 'A' + 10 : -1;
+}
+
+/*
+ * set_var_from_non_decimal_integer_str()
+ *
+ *	Parse a string containing a non-decimal integer
+ *
+ * This function does not handle leading or trailing spaces.  It returns
+ * the end+1 position parsed into *endptr, so that caller can check for
+ * trailing spaces/garbage if deemed necessary.
+ *
+ * cp is the place to actually start parsing; str is what to use in error
+ * reports.  The number's sign and base prefix indicator (e.g., "0x") are
+ * assumed to have already been parsed, so cp should point to the number's
+ * first digit in the base specified.
+ *
+ * base is expected to be 2, 8 or 16.
+ *
+ * Returns true on success, false on failure (if escontext points to an
+ * ErrorSaveContext; otherwise errors are thrown).
+ */
+static bool
+set_var_from_non_decimal_integer_str(const char *str, const char *cp, int sign,
+									 int base, NumericVar *dest,
+									 const char **endptr, Node *escontext)
+{
+	const char *firstdigit = cp;
+	int64		tmp;
+	int64		mul;
+	NumericVar	tmp_var;
+
+	init_var(&tmp_var);
+
+	zero_var(dest);
+
+	/* Process digits in groups that fit in an int64 */
+	tmp = 0;
+	mul = 1;
+
+	if (base == 16)
+	{
+		while (*cp)
+		{
+			if (isxdigit((unsigned char) *cp))
+			{
+				if (mul > PG_INT64_MAX / 16)
+				{
+					/* Add the contribution from this group of digits */
+					int64_to_numericvar(mul, &tmp_var);
+					mul_var(dest, &tmp_var, dest, 0);
+					int64_to_numericvar(tmp, &tmp_var);
+					add_var(dest, &tmp_var, dest);
+
+					/* Result will overflow if weight overflows int16 */
+					if (dest->weight > SHRT_MAX)
+						goto out_of_range;
+
+					/* Begin a new group */
+					tmp = 0;
+					mul = 1;
+				}
+
+				tmp = tmp * 16 + xdigit_value(*cp++);
+				mul = mul * 16;
+			}
+			else
+				break;
+		}
+	}
+	else if (base == 8)
+	{
+		while (*cp)
+		{
+			if (*cp >= '0' && *cp <= '7')
+			{
+				if (mul > PG_INT64_MAX / 8)
+				{
+					/* Add the contribution from this group of digits */
+					int64_to_numericvar(mul, &tmp_var);
+					mul_var(dest, &tmp_var, dest, 0);
+					int64_to_numericvar(tmp, &tmp_var);
+					add_var(dest, &tmp_var, dest);
+
+					/* Result will overflow if weight overflows int16 */
+					if (dest->weight > SHRT_MAX)
+						goto out_of_range;
+
+					/* Begin a new group */
+					tmp = 0;
+					mul = 1;
+				}
+
+				tmp = tmp * 8 + (*cp++ - '0');
+				mul = mul * 8;
+			}
+			else
+				break;
+		}
+	}
+	else if (base == 2)
+	{
+		while (*cp)
+		{
+			if (*cp >= '0' && *cp <= '1')
+			{
+				if (mul > PG_INT64_MAX / 2)
+				{
+					/* Add the contribution from this group of digits */
+					int64_to_numericvar(mul, &tmp_var);
+					mul_var(dest, &tmp_var, dest, 0);
+					int64_to_numericvar(tmp, &tmp_var);
+					add_var(dest, &tmp_var, dest);
+
+					/* Result will overflow if weight overflows int16 */
+					if (dest->weight > SHRT_MAX)
+						goto out_of_range;
+
+					/* Begin a new group */
+					tmp = 0;
+					mul = 1;
+				}
+
+				tmp = tmp * 2 + (*cp++ - '0');
+				mul = mul * 2;
+			}
+			else
+				break;
+		}
+	}
+	else
+		/* Should never happen; treat as invalid input */
+		goto invalid_syntax;
+
+	/* Check that we got at least one digit */
+	if (unlikely(cp == firstdigit))
+		goto invalid_syntax;
+
+	/* Add the contribution from the final group of digits */
+	int64_to_numericvar(mul, &tmp_var);
+	mul_var(dest, &tmp_var, dest, 0);
+	int64_to_numericvar(tmp, &tmp_var);
+	add_var(dest, &tmp_var, dest);
+
+	if (dest->weight > SHRT_MAX)
+		goto out_of_range;
+
+	dest->sign = sign;
+
+	free_var(&tmp_var);
+
+	/* Return end+1 position for caller */
+	*endptr = cp;
+
+	return true;
+
+out_of_range:
+	ereturn(escontext, false,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value overflows numeric format")));
+
+invalid_syntax:
+	ereturn(escontext, false,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"numeric", str)));
+}
+
+
 /*
  * set_var_from_num() -
  *
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
index 30a5613ed7..4eccc2086d 100644
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2144,6 +2144,12 @@ INSERT INTO num_input_test(n1) VALUES (' -inf ');
 INSERT INTO num_input_test(n1) VALUES (' Infinity ');
 INSERT INTO num_input_test(n1) VALUES (' +inFinity ');
 INSERT INTO num_input_test(n1) VALUES (' -INFINITY ');
+INSERT INTO num_input_test(n1) VALUES ('0b10001110111100111100001001010');
+INSERT INTO num_input_test(n1) VALUES ('  -0B1010101101010100101010011000110011101011000111110000101011010010  ');
+INSERT INTO num_input_test(n1) VALUES ('  +0o112402761777 ');
+INSERT INTO num_input_test(n1) VALUES ('-0O001255245230633431670261');
+INSERT INTO num_input_test(n1) VALUES ('-0x0000000000000000000000000deadbeef');
+INSERT INTO num_input_test(n1) VALUES (' 0X30b1F33a6DF0bD4E64DF9BdA7D15 ');
 -- bad inputs
 INSERT INTO num_input_test(n1) VALUES ('     ');
 ERROR:  invalid input syntax for type numeric: "     "
@@ -2181,23 +2187,41 @@ INSERT INTO num_input_test(n1) VALUES ('+ infinity');
 ERROR:  invalid input syntax for type numeric: "+ infinity"
 LINE 1: INSERT INTO num_input_test(n1) VALUES ('+ infinity');
                                                ^
+INSERT INTO num_input_test(n1) VALUES ('0b1112');
+ERROR:  invalid input syntax for type numeric: "0b1112"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0b1112');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('0c1112');
+ERROR:  invalid input syntax for type numeric: "0c1112"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0c1112');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('0x12.34');
+ERROR:  invalid input syntax for type numeric: "0x12.34"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x12.34');
+                                               ^
 SELECT * FROM num_input_test;
-    n1     
------------
-       123
-   3245874
-    -93853
-    555.50
-   -555.50
-       NaN
-       NaN
-  Infinity
-  Infinity
- -Infinity
-  Infinity
-  Infinity
- -Infinity
-(13 rows)
+                n1                 
+-----------------------------------
+                               123
+                           3245874
+                            -93853
+                            555.50
+                           -555.50
+                               NaN
+                               NaN
+                          Infinity
+                          Infinity
+                         -Infinity
+                          Infinity
+                          Infinity
+                         -Infinity
+                         299792458
+             -12345678901234567890
+                        9999999999
+             -12345678900987654321
+                       -3735928559
+ 987654321234567898765432123456789
+(19 rows)
 
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('34.5', 'numeric');
@@ -2242,6 +2266,12 @@ SELECT pg_input_error_message('1234.567', 'numeric(7,4)');
  numeric field overflow
 (1 row)
 
+SELECT pg_input_error_message('0x1234.567', 'numeric');
+               pg_input_error_message                
+-----------------------------------------------------
+ invalid input syntax for type numeric: "0x1234.567"
+(1 row)
+
 --
 -- Test precision and scale typemods
 --
diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out
index 15cd6b1672..deb26d31c3 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -104,9 +104,11 @@ SELECT 0b111111111111111111111111111111111111111111111111111111111111111;
 (1 row)
 
 SELECT 0b1000000000000000000000000000000000000000000000000000000000000000;
-ERROR:  invalid input syntax for type numeric: "0b1000000000000000000000000000000000000000000000000000000000000000"
-LINE 1: SELECT 0b100000000000000000000000000000000000000000000000000...
-               ^
+      ?column?       
+---------------------
+ 9223372036854775808
+(1 row)
+
 SELECT 0o777777777777777777777;
       ?column?       
 ---------------------
@@ -114,9 +116,11 @@ SELECT 0o777777777777777777777;
 (1 row)
 
 SELECT 0o1000000000000000000000;
-ERROR:  invalid input syntax for type numeric: "0o1000000000000000000000"
-LINE 1: SELECT 0o1000000000000000000000;
-               ^
+      ?column?       
+---------------------
+ 9223372036854775808
+(1 row)
+
 SELECT 0x7FFFFFFFFFFFFFFF;
       ?column?       
 ---------------------
@@ -124,9 +128,11 @@ SELECT 0x7FFFFFFFFFFFFFFF;
 (1 row)
 
 SELECT 0x8000000000000000;
-ERROR:  invalid input syntax for type numeric: "0x8000000000000000"
-LINE 1: SELECT 0x8000000000000000;
-               ^
+      ?column?       
+---------------------
+ 9223372036854775808
+(1 row)
+
 SELECT -0b1000000000000000000000000000000000000000000000000000000000000000;
        ?column?       
 ----------------------
@@ -134,9 +140,11 @@ SELECT -0b1000000000000000000000000000000000000000000000000000000000000000;
 (1 row)
 
 SELECT -0b1000000000000000000000000000000000000000000000000000000000000001;
-ERROR:  invalid input syntax for type numeric: "-0b1000000000000000000000000000000000000000000000000000000000000001"
-LINE 1: SELECT -0b10000000000000000000000000000000000000000000000000...
-               ^
+       ?column?       
+----------------------
+ -9223372036854775809
+(1 row)
+
 SELECT -0o1000000000000000000000;
        ?column?       
 ----------------------
@@ -144,9 +152,11 @@ SELECT -0o1000000000000000000000;
 (1 row)
 
 SELECT -0o1000000000000000000001;
-ERROR:  invalid input syntax for type numeric: "-0o1000000000000000000001"
-LINE 1: SELECT -0o1000000000000000000001;
-               ^
+       ?column?       
+----------------------
+ -9223372036854775809
+(1 row)
+
 SELECT -0x8000000000000000;
        ?column?       
 ----------------------
@@ -154,9 +164,11 @@ SELECT -0x8000000000000000;
 (1 row)
 
 SELECT -0x8000000000000001;
-ERROR:  invalid input syntax for type numeric: "-0x8000000000000001"
-LINE 1: SELECT -0x8000000000000001;
-               ^
+       ?column?       
+----------------------
+ -9223372036854775809
+(1 row)
+
 -- error cases
 SELECT 123abc;
 ERROR:  trailing junk after numeric literal at or near "123a"
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
index 7bb34e5021..b04652e38b 100644
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1039,6 +1039,12 @@ INSERT INTO num_input_test(n1) VALUES (' -inf ');
 INSERT INTO num_input_test(n1) VALUES (' Infinity ');
 INSERT INTO num_input_test(n1) VALUES (' +inFinity ');
 INSERT INTO num_input_test(n1) VALUES (' -INFINITY ');
+INSERT INTO num_input_test(n1) VALUES ('0b10001110111100111100001001010');
+INSERT INTO num_input_test(n1) VALUES ('  -0B1010101101010100101010011000110011101011000111110000101011010010  ');
+INSERT INTO num_input_test(n1) VALUES ('  +0o112402761777 ');
+INSERT INTO num_input_test(n1) VALUES ('-0O001255245230633431670261');
+INSERT INTO num_input_test(n1) VALUES ('-0x0000000000000000000000000deadbeef');
+INSERT INTO num_input_test(n1) VALUES (' 0X30b1F33a6DF0bD4E64DF9BdA7D15 ');
 
 -- bad inputs
 INSERT INTO num_input_test(n1) VALUES ('     ');
@@ -1050,6 +1056,9 @@ INSERT INTO num_input_test(n1) VALUES ('5. 0   ');
 INSERT INTO num_input_test(n1) VALUES ('');
 INSERT INTO num_input_test(n1) VALUES (' N aN ');
 INSERT INTO num_input_test(n1) VALUES ('+ infinity');
+INSERT INTO num_input_test(n1) VALUES ('0b1112');
+INSERT INTO num_input_test(n1) VALUES ('0c1112');
+INSERT INTO num_input_test(n1) VALUES ('0x12.34');
 
 SELECT * FROM num_input_test;
 
@@ -1061,6 +1070,7 @@ SELECT pg_input_error_message('1e400000', 'numeric');
 SELECT pg_input_is_valid('1234.567', 'numeric(8,4)');
 SELECT pg_input_is_valid('1234.567', 'numeric(7,4)');
 SELECT pg_input_error_message('1234.567', 'numeric(7,4)');
+SELECT pg_input_error_message('0x1234.567', 'numeric');
 
 --
 -- Test precision and scale typemods
-- 
2.35.3

0002-Add-underscore-support-to-type-numeric.patchtext/x-patch; charset=US-ASCII; name=0002-Add-underscore-support-to-type-numeric.patchDownload

From fc354a7d0bfeca3b565f71f39416a9c5c99700fb Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Fri, 13 Jan 2023 09:33:58 +0000
Subject: [PATCH 2/2] Add underscore support to type numeric.

XXX: No parser support for such inputs yet. Merge with Peter's patch?
---
 src/backend/utils/adt/numeric.c       | 106 ++++++++++++++++++++------
 src/test/regress/expected/numeric.out |  62 ++++++++++++++-
 src/test/regress/sql/numeric.sql      |  22 +++++-
 3 files changed, 160 insertions(+), 30 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index ed592841dc..3ef4541c0f 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -6924,10 +6924,7 @@ set_var_from_str(const char *str, const char *cp,
 	}
 
 	if (!isdigit((unsigned char) *cp))
-		ereturn(escontext, false,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type %s: \"%s\"",
-						"numeric", str)));
+		goto invalid_syntax;
 
 	decdigits = (unsigned char *) palloc(strlen(cp) + DEC_DIGITS * 2);
 
@@ -6948,12 +6945,19 @@ set_var_from_str(const char *str, const char *cp,
 		else if (*cp == '.')
 		{
 			if (have_dp)
-				ereturn(escontext, false,
-						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-						 errmsg("invalid input syntax for type %s: \"%s\"",
-								"numeric", str)));
+				goto invalid_syntax;
 			have_dp = true;
 			cp++;
+			/* decimal point must not be followed by underscore */
+			if (*cp == '_')
+				goto invalid_syntax;
+		}
+		else if (*cp == '_')
+		{
+			/* underscore must be followed by more digits */
+			cp++;
+			if (!isdigit((unsigned char) *cp))
+				goto invalid_syntax;
 		}
 		else
 			break;
@@ -6966,17 +6970,8 @@ set_var_from_str(const char *str, const char *cp,
 	/* Handle exponent, if any */
 	if (*cp == 'e' || *cp == 'E')
 	{
-		long		exponent;
-		char	   *endptr;
-
-		cp++;
-		exponent = strtol(cp, &endptr, 10);
-		if (endptr == cp)
-			ereturn(escontext, false,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid input syntax for type %s: \"%s\"",
-							"numeric", str)));
-		cp = endptr;
+		long		exponent = 0;
+		bool		neg = false;
 
 		/*
 		 * At this point, dweight and dscale can't be more than about
@@ -6986,10 +6981,43 @@ set_var_from_str(const char *str, const char *cp,
 		 * fit in storage format, make_result() will complain about it later;
 		 * for consistency use the same ereport errcode/text as make_result().
 		 */
-		if (exponent >= INT_MAX / 2 || exponent <= -(INT_MAX / 2))
-			ereturn(escontext, false,
-					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-					 errmsg("value overflows numeric format")));
+
+		/* exponent sign */
+		cp++;
+		if (*cp == '+')
+			cp++;
+		else if (*cp == '-')
+		{
+			neg = true;
+			cp++;
+		}
+
+		/* exponent digits */
+		if (!isdigit((unsigned char) *cp))
+			goto invalid_syntax;
+
+		while (*cp)
+		{
+			if (isdigit((unsigned char) *cp))
+			{
+				exponent = exponent * 10 + (*cp++ - '0');
+				if (exponent > INT_MAX / 2)
+					goto out_of_range;
+			}
+			else if (*cp == '_')
+			{
+				/* underscore must be followed by more digits */
+				cp++;
+				if (!isdigit((unsigned char) *cp))
+					goto invalid_syntax;
+			}
+			else
+				break;
+		}
+
+		if (neg)
+			exponent = -exponent;
+
 		dweight += (int) exponent;
 		dscale -= (int) exponent;
 		if (dscale < 0)
@@ -7041,6 +7069,17 @@ set_var_from_str(const char *str, const char *cp,
 	*endptr = cp;
 
 	return true;
+
+out_of_range:
+	ereturn(escontext, false,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value overflows numeric format")));
+
+invalid_syntax:
+	ereturn(escontext, false,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"numeric", str)));
 }
 
 
@@ -7118,6 +7157,13 @@ set_var_from_non_decimal_integer_str(const char *str, const char *cp, int sign,
 				tmp = tmp * 16 + xdigit_value(*cp++);
 				mul = mul * 16;
 			}
+			else if (*cp == '_')
+			{
+				/* Underscore must be followed by more digits */
+				cp++;
+				if (!isxdigit((unsigned char) *cp))
+					goto invalid_syntax;
+			}
 			else
 				break;
 		}
@@ -7148,6 +7194,13 @@ set_var_from_non_decimal_integer_str(const char *str, const char *cp, int sign,
 				tmp = tmp * 8 + (*cp++ - '0');
 				mul = mul * 8;
 			}
+			else if (*cp == '_')
+			{
+				/* Underscore must be followed by more digits */
+				cp++;
+				if (*cp < '0' || *cp > '7')
+					goto invalid_syntax;
+			}
 			else
 				break;
 		}
@@ -7178,6 +7231,13 @@ set_var_from_non_decimal_integer_str(const char *str, const char *cp, int sign,
 				tmp = tmp * 2 + (*cp++ - '0');
 				mul = mul * 2;
 			}
+			else if (*cp == '_')
+			{
+				/* Underscore must be followed by more digits */
+				cp++;
+				if (*cp < '0' || *cp > '1')
+					goto invalid_syntax;
+			}
 			else
 				break;
 		}
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
index 4eccc2086d..a3fe3c771a 100644
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2144,12 +2144,17 @@ INSERT INTO num_input_test(n1) VALUES (' -inf ');
 INSERT INTO num_input_test(n1) VALUES (' Infinity ');
 INSERT INTO num_input_test(n1) VALUES (' +inFinity ');
 INSERT INTO num_input_test(n1) VALUES (' -INFINITY ');
+INSERT INTO num_input_test(n1) VALUES ('12_000_000_000');
+INSERT INTO num_input_test(n1) VALUES ('12_000.123_456');
+INSERT INTO num_input_test(n1) VALUES ('23_000_000_000e-1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e+1_1');
 INSERT INTO num_input_test(n1) VALUES ('0b10001110111100111100001001010');
-INSERT INTO num_input_test(n1) VALUES ('  -0B1010101101010100101010011000110011101011000111110000101011010010  ');
+INSERT INTO num_input_test(n1) VALUES ('  -0B_1010_1011_0101_0100_1010_1001_1000_1100_1110_1011_0001_1111_0000_1010_1101_0010  ');
 INSERT INTO num_input_test(n1) VALUES ('  +0o112402761777 ');
-INSERT INTO num_input_test(n1) VALUES ('-0O001255245230633431670261');
+INSERT INTO num_input_test(n1) VALUES ('-0O0012_5524_5230_6334_3167_0261');
 INSERT INTO num_input_test(n1) VALUES ('-0x0000000000000000000000000deadbeef');
-INSERT INTO num_input_test(n1) VALUES (' 0X30b1F33a6DF0bD4E64DF9BdA7D15 ');
+INSERT INTO num_input_test(n1) VALUES (' 0X_30b1_F33a_6DF0_bD4E_64DF_9BdA_7D15 ');
 -- bad inputs
 INSERT INTO num_input_test(n1) VALUES ('     ');
 ERROR:  invalid input syntax for type numeric: "     "
@@ -2187,6 +2192,38 @@ INSERT INTO num_input_test(n1) VALUES ('+ infinity');
 ERROR:  invalid input syntax for type numeric: "+ infinity"
 LINE 1: INSERT INTO num_input_test(n1) VALUES ('+ infinity');
                                                ^
+INSERT INTO num_input_test(n1) VALUES ('_123');
+ERROR:  invalid input syntax for type numeric: "_123"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('_123');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('123_');
+ERROR:  invalid input syntax for type numeric: "123_"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('123_');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('12__34');
+ERROR:  invalid input syntax for type numeric: "12__34"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('12__34');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('123_.456');
+ERROR:  invalid input syntax for type numeric: "123_.456"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('123_.456');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('123._456');
+ERROR:  invalid input syntax for type numeric: "123._456"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('123._456');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('1.2e_34');
+ERROR:  invalid input syntax for type numeric: "1.2e_34"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('1.2e_34');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('1.2e34_');
+ERROR:  invalid input syntax for type numeric: "1.2e34_"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('1.2e34_');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('1.2e3__4');
+ERROR:  invalid input syntax for type numeric: "1.2e3__4"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('1.2e3__4');
+                                               ^
 INSERT INTO num_input_test(n1) VALUES ('0b1112');
 ERROR:  invalid input syntax for type numeric: "0b1112"
 LINE 1: INSERT INTO num_input_test(n1) VALUES ('0b1112');
@@ -2199,6 +2236,18 @@ INSERT INTO num_input_test(n1) VALUES ('0x12.34');
 ERROR:  invalid input syntax for type numeric: "0x12.34"
 LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x12.34');
                                                ^
+INSERT INTO num_input_test(n1) VALUES ('0x__1234');
+ERROR:  invalid input syntax for type numeric: "0x__1234"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x__1234');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('0x1234_');
+ERROR:  invalid input syntax for type numeric: "0x1234_"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x1234_');
+                                               ^
+INSERT INTO num_input_test(n1) VALUES ('0x12__34');
+ERROR:  invalid input syntax for type numeric: "0x12__34"
+LINE 1: INSERT INTO num_input_test(n1) VALUES ('0x12__34');
+                                               ^
 SELECT * FROM num_input_test;
                 n1                 
 -----------------------------------
@@ -2215,13 +2264,18 @@ SELECT * FROM num_input_test;
                           Infinity
                           Infinity
                          -Infinity
+                       12000000000
+                      12000.123456
+                      2.3000000000
+                              1.23
+                              12.3
                          299792458
              -12345678901234567890
                         9999999999
              -12345678900987654321
                        -3735928559
  987654321234567898765432123456789
-(19 rows)
+(24 rows)
 
 -- Also try it with non-error-throwing API
 SELECT pg_input_is_valid('34.5', 'numeric');
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
index b04652e38b..9c160e7d0d 100644
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1039,12 +1039,17 @@ INSERT INTO num_input_test(n1) VALUES (' -inf ');
 INSERT INTO num_input_test(n1) VALUES (' Infinity ');
 INSERT INTO num_input_test(n1) VALUES (' +inFinity ');
 INSERT INTO num_input_test(n1) VALUES (' -INFINITY ');
+INSERT INTO num_input_test(n1) VALUES ('12_000_000_000');
+INSERT INTO num_input_test(n1) VALUES ('12_000.123_456');
+INSERT INTO num_input_test(n1) VALUES ('23_000_000_000e-1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e1_0');
+INSERT INTO num_input_test(n1) VALUES ('.000_000_000_123e+1_1');
 INSERT INTO num_input_test(n1) VALUES ('0b10001110111100111100001001010');
-INSERT INTO num_input_test(n1) VALUES ('  -0B1010101101010100101010011000110011101011000111110000101011010010  ');
+INSERT INTO num_input_test(n1) VALUES ('  -0B_1010_1011_0101_0100_1010_1001_1000_1100_1110_1011_0001_1111_0000_1010_1101_0010  ');
 INSERT INTO num_input_test(n1) VALUES ('  +0o112402761777 ');
-INSERT INTO num_input_test(n1) VALUES ('-0O001255245230633431670261');
+INSERT INTO num_input_test(n1) VALUES ('-0O0012_5524_5230_6334_3167_0261');
 INSERT INTO num_input_test(n1) VALUES ('-0x0000000000000000000000000deadbeef');
-INSERT INTO num_input_test(n1) VALUES (' 0X30b1F33a6DF0bD4E64DF9BdA7D15 ');
+INSERT INTO num_input_test(n1) VALUES (' 0X_30b1_F33a_6DF0_bD4E_64DF_9BdA_7D15 ');
 
 -- bad inputs
 INSERT INTO num_input_test(n1) VALUES ('     ');
@@ -1056,9 +1061,20 @@ INSERT INTO num_input_test(n1) VALUES ('5. 0   ');
 INSERT INTO num_input_test(n1) VALUES ('');
 INSERT INTO num_input_test(n1) VALUES (' N aN ');
 INSERT INTO num_input_test(n1) VALUES ('+ infinity');
+INSERT INTO num_input_test(n1) VALUES ('_123');
+INSERT INTO num_input_test(n1) VALUES ('123_');
+INSERT INTO num_input_test(n1) VALUES ('12__34');
+INSERT INTO num_input_test(n1) VALUES ('123_.456');
+INSERT INTO num_input_test(n1) VALUES ('123._456');
+INSERT INTO num_input_test(n1) VALUES ('1.2e_34');
+INSERT INTO num_input_test(n1) VALUES ('1.2e34_');
+INSERT INTO num_input_test(n1) VALUES ('1.2e3__4');
 INSERT INTO num_input_test(n1) VALUES ('0b1112');
 INSERT INTO num_input_test(n1) VALUES ('0c1112');
 INSERT INTO num_input_test(n1) VALUES ('0x12.34');
+INSERT INTO num_input_test(n1) VALUES ('0x__1234');
+INSERT INTO num_input_test(n1) VALUES ('0x1234_');
+INSERT INTO num_input_test(n1) VALUES ('0x12__34');
 
 SELECT * FROM num_input_test;
 
-- 
2.35.3

#56

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Dean Rasheed (#55)

Re: Non-decimal integer literals

On 13.01.23 11:01, Dean Rasheed wrote:

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.

This is great! How do you want to proceed? You also posted an updated
patch in the "underscores" thread and suggested some additional work
there. In which order should these be addressed, in your opinion?

#57

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Peter Eisentraut (#56)

Re: Non-decimal integer literals

On Mon, 23 Jan 2023 at 15:55, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 13.01.23 11:01, Dean Rasheed wrote:

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.

This is great! How do you want to proceed? You also posted an updated
patch in the "underscores" thread and suggested some additional work
there. In which order should these be addressed, in your opinion?

I think it makes most sense if I push 0001 now, and then merge 0002
into the underscores patch. I think at least one of the suggested
changes to the underscores patch required 0002 to work.

Regards,
Dean

#58

Joel Jacobson

joel@compiler.org

almost 3 years ago

In reply to: Dean Rasheed (#55)

Re: Non-decimal integer literals

On Fri, Jan 13, 2023, at 07:01, Dean Rasheed wrote:

Attachments:
* 0001-Add-non-decimal-integer-support-to-type-numeric.patch

Nice! This also simplifies when dealing with non-negative integers represented as byte arrays,
common in e.g. cryptography code.

Before, one had to implement numeric_from_bytes(bytea) in plpgsql [1] https://gist.github.com/joelonsql/f54552db1f0fd6d9b3397d255e51f58a,
which can now be greatly simplified:

create function numeric_from_bytes(bytea) returns numeric language sql as $$
select ('0'||right($1::text,-1))::numeric
$$;

\timing
select numeric_from_bytes(('\x'||repeat('0123456789abcdef',1000))::bytea);
Time: 484.223 ms -- HEAD + plpgsql numeric_from_bytes()
Time: 19.790 ms -- 0001 + simplified numeric_from_bytes()

About 25x faster!

Would we want a built-in function for this?
To avoid the text casts, but also to improve user-friendliness,
since the improved solution is still a hack a user needing it has to someone come up with or find.
The topic "Convert hex in text representation to decimal number" is an old one on Stackoverflow [2] https://stackoverflow.com/questions/8316164/convert-hex-in-text-representation-to-decimal-number,
posted 11 years ago, with a myriad of various hackis solutions, out of which one had a bug that I reported.
Many other modern languages seems to have this as a built-in or in stdlibs:
Python3:
classmethod int.from_bytes(bytes, byteorder='big', *, signed=False)
Rust:
pub const fn from_be_bytes(bytes: [u8; 8]) -> u64

/Joel

[1]: https://gist.github.com/joelonsql/f54552db1f0fd6d9b3397d255e51f58a
[2]: https://stackoverflow.com/questions/8316164/convert-hex-in-text-representation-to-decimal-number

#59

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Joel Jacobson (#58)

Re: Non-decimal integer literals

On Mon, 23 Jan 2023 at 20:00, Joel Jacobson <joel@compiler.org> wrote:

Nice! This also simplifies when dealing with non-negative integers represented as byte arrays,
common in e.g. cryptography code.

Ah, interesting. I hadn't thought of that use-case.

create function numeric_from_bytes(bytea) returns numeric language sql as $$
select ('0'||right($1::text,-1))::numeric
$$;

Would we want a built-in function for this?

Not sure. It does feel a bit niche. It's quite common in other
programming languages, but that doesn't mean that a lot of Postgres
users need it. Perhaps start a new thread to gauge people's interest?

Regards,
Dean

#60

Ranier Vilela

ranier.vf@gmail.com

almost 3 years ago

In reply to: Dean Rasheed (#59)

1 attachment(s)

Re: Non-decimal integer literals

On 13.01.23 11:01, Dean Rasheed wrote:

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.

Hi Dean, thanks for your work.

But since PG_RETURN_NULL, is a simple return,
now the "value" var is not leaked?

If not, sorry for the noise.

regards,
Ranier Vilela

Attachments:

avoid_leak_value_numeric.patchapplication/octet-stream; name=avoid_leak_value_numeric.patchDownload

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 898c52099b..93d9bcce93 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -753,14 +753,20 @@ numeric_in(PG_FUNCTION_ARGS)
 		if (base == 10)
 		{
 			if (!set_var_from_str(str, cp, &value, &cp, escontext))
+			{
+				free_var(&value);
 				PG_RETURN_NULL();
+			}
 			value.sign = sign;
 		}
 		else
 		{
 			if (!set_var_from_non_decimal_integer_str(str, cp + 2, sign, base,
 													  &value, &cp, escontext))
+			{
+				free_var(&value);
 				PG_RETURN_NULL();
+			}
 		}
 
 		/*
@@ -775,7 +781,10 @@ numeric_in(PG_FUNCTION_ARGS)
 		}
 
 		if (!apply_typmod(&value, typmod, escontext))
+		{
+			free_var(&value);		
 			PG_RETURN_NULL();
+		}
 
 		res = make_result_opt_error(&value, &have_error);

Import Notes

Resolved by subject fallback

#61

Dean Rasheed

dean.a.rasheed@gmail.com

almost 3 years ago

In reply to: Ranier Vilela (#60)

Re: Non-decimal integer literals

On Tue, 24 Jan 2023 at 00:47, Ranier Vilela <ranier.vf@gmail.com> wrote:

On 13.01.23 11:01, Dean Rasheed wrote:

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.

Hi Dean, thanks for your work.

But since PG_RETURN_NULL, is a simple return,
now the "value" var is not leaked?

That originates from a prior commit:

ccff2d20ed Convert a few datatype input functions to use "soft" error reporting.

and see also a bunch of follow-on commits for other input functions.

It will only return NULL if the input is invalid and escontext is
non-NULL. You only identified a fraction of the cases where that would
happen. If we really cared about not leaking memory for invalid
inputs, we'd have to look at every code path using ereturn()
(including lower-level functions, and not just in numeric.c). I think
that would be a waste of time, and counterproductive -- trying to
immediately free memory for all possible invalid inputs would likely
complicate a lot of code, and slow down parsing of valid inputs.
Better to leave it until the owning memory context is freed.

Regards,
Dean

#62

Ranier Vilela

ranier.vf@gmail.com

almost 3 years ago

In reply to: Dean Rasheed (#61)

Re: Non-decimal integer literals

Em ter., 24 de jan. de 2023 às 07:24, Dean Rasheed <dean.a.rasheed@gmail.com>
escreveu:

On Tue, 24 Jan 2023 at 00:47, Ranier Vilela <ranier.vf@gmail.com> wrote:

On 13.01.23 11:01, Dean Rasheed wrote:

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.

Hi Dean, thanks for your work.

But since PG_RETURN_NULL, is a simple return,
now the "value" var is not leaked?

That originates from a prior commit:

ccff2d20ed Convert a few datatype input functions to use "soft" error
reporting.

and see also a bunch of follow-on commits for other input functions.

It will only return NULL if the input is invalid and escontext is
non-NULL. You only identified a fraction of the cases where that would
happen. If we really cared about not leaking memory for invalid
inputs, we'd have to look at every code path using ereturn()
(including lower-level functions, and not just in numeric.c). I think
that would be a waste of time, and counterproductive -- trying to
immediately free memory for all possible invalid inputs would likely
complicate a lot of code, and slow down parsing of valid inputs.
Better to leave it until the owning memory context is freed.

Thank you for the explanation.

regards,
Ranier Vilela