Table with large number of int columns, very slow COPY FROM

Started by Alex Tokarevabout 8 years ago9 messages

dwalin@dwalin.ru

about 8 years ago

Hi,

I have a set of tables with fairly large number of columns, mostly int with
a few bigints and short char/varchar columns. I¹ve noticed that Postgres is
pretty slow at inserting data in such a table. I tried to tune every
possible setting: using unlogged tables, increased shared_buffers, etc; even
placed the db cluster on ramfs and turned fsync off. The results are pretty
much the same with the exception of using unlogged tables that improves
performance just a little bit.

I have made a minimally reproducible test case consisting of a table with
848 columns, inserting partial dataset of 100,000 rows with 240 columns. On
my dev VM the COPY FROM operation takes just shy of 3 seconds to complete,
which is entirely unexpected for such a small dataset.

Here¹s a tarball with test schema and data:
http://nohuhu.org/copy_perf.tar.bz2; it¹s 338k compressed but expands to
~50mb. Here¹s the result of profiling session with perf:
https://pastebin.com/pjv7JqxD

--
Regards,
Alex.

Andreas Kretschmer

andreas@a-kretschmer.de

about 8 years ago

In reply to: Alex Tokarev (#1)

Re: Table with large number of int columns, very slow COPY FROM

On 08.12.2017 05:21, Alex Tokarev wrote:

I have made a minimally reproducible test case consisting of a table
with 848 columns

Such a high number of columns is maybe a sign of a wrong table /
database design, why do you have such a lot of columns? How many indexes
do you have?

Regards, Andreas

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Alex Tokarev (#1)

Re: Table with large number of int columns, very slow COPY FROM

Hi,

On 2017-12-07 20:21:45 -0800, Alex Tokarev wrote:

I have a set of tables with fairly large number of columns, mostly int with
a few bigints and short char/varchar columns. Iï¿½ve noticed that Postgres is
pretty slow at inserting data in such a table. I tried to tune every
possible setting: using unlogged tables, increased shared_buffers, etc; even
placed the db cluster on ramfs and turned fsync off. The results are pretty
much the same with the exception of using unlogged tables that improves
performance just a little bit.

I have made a minimally reproducible test case consisting of a table with
848 columns, inserting partial dataset of 100,000 rows with 240 columns. On
my dev VM the COPY FROM operation takes just shy of 3 seconds to complete,
which is entirely unexpected for such a small dataset.

I don't find this to be this absurdly slow. On my laptop loading with a
development checkout this takes 1223.950 ms. This is 20mio fields
parsed/sec, rows with 69mio fields/sec inserted. Removing the TRUNCATE
and running the COPYs concurrently scales well to a few clients, and
only stops because my laptop's SSD stops being able to keep up.

That said, I do think there's a few places that could stand some
improvement. Locally the profile shows up as:
+   15.38%  postgres  libc-2.25.so        [.] __GI_____strtoll_l_internal
+   11.79%  postgres  postgres            [.] heap_fill_tuple
+    8.00%  postgres  postgres            [.] CopyFrom
+    7.40%  postgres  postgres            [.] CopyReadLine
+    6.79%  postgres  postgres            [.] ExecConstraints
+    6.68%  postgres  postgres            [.] NextCopyFromRawFields
+    6.36%  postgres  postgres            [.] heap_compute_data_size
+    6.02%  postgres  postgres            [.] pg_atoi

the strtoll is libc functionality triggered by pg_atoi(), something I've
seen show up in numerous profiles. I think it's probably time to have
our own optimized version of it rather than relying on libcs.

That heap_fill_tuple(), which basically builds a tuple from the parsed
datums, takes time somewhat proportional to the number of columns in the
table seems hard to avoid, especially because this isn't something we
want to optimize for with the price of making more common workloads with
fewer columns slower. But there seems quite some micro-optimization
potential.

That ExecConstraints() shows up seems unsurprising, it has to walk
through all the table's columns checking for constraints. We could
easily optimize this so we have a separate datastructure listing
constraints, but that'd be slower in the very common case of more
reasonable numbers of columns.

The copy implementation deserves some optimization too...

Hereï¿½s a tarball with test schema and data:
http://nohuhu.org/copy_perf.tar.bz2; itï¿½s 338k compressed but expands to
~50mb. Hereï¿½s the result of profiling session with perf:
https://pastebin.com/pjv7JqxD

Thanks!

Greetings,

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Andres Freund (#3)

2 attachment(s)

Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

Hi,

On 2017-12-08 10:17:34 -0800, Andres Freund wrote:

the strtoll is libc functionality triggered by pg_atoi(), something I've
seen show up in numerous profiles. I think it's probably time to have
our own optimized version of it rather than relying on libcs.

Attached is a hand-rolled version. After quickly hacking up one from
scratch, I noticed we already kind of have one for int64 (scanint8), so
I changed the structure of this one to be relatively similar.

It's currently using the overflow logic from [1]http://archives.postgresql.org/message-id/20171030112751.mukkriz2rur2qkxc%40alap3.anarazel.de, but that's not
fundamentally required, we could rely on fwrapv for this one too.

This one improves performance of the submitted workload from 1223.950ms
to 1020.640ms (best of three). The profile's shape changes quite
noticeably:

master:
+   15.38%  postgres  libc-2.25.so      [.] __GI_____strtoll_l_internal
+   11.79%  postgres  postgres          [.] heap_fill_tuple
+    8.00%  postgres  postgres          [.] CopyFrom
+    7.40%  postgres  postgres          [.] CopyReadLine
+    6.79%  postgres  postgres          [.] ExecConstraints
+    6.68%  postgres  postgres          [.] NextCopyFromRawFields
+    6.36%  postgres  postgres          [.] heap_compute_data_size
+    6.02%  postgres  postgres          [.] pg_atoi
patch:
+   13.70%  postgres  postgres          [.] heap_fill_tuple
+   10.46%  postgres  postgres          [.] CopyFrom
+    9.31%  postgres  postgres          [.] pg_strto32
+    8.39%  postgres  postgres          [.] CopyReadLine
+    7.88%  postgres  postgres          [.] ExecConstraints
+    7.63%  postgres  postgres          [.] InputFunctionCall
+    7.41%  postgres  postgres          [.] heap_compute_data_size
+    7.21%  postgres  postgres          [.] pg_verify_mbstr
+    5.49%  postgres  postgres          [.] NextCopyFromRawFields

This probably isn't going to resolve Alex's performance concerns
meaningfully, but seems quite worthwhile to do anyway.

We probably should have int8/16/64 version coded just as use the 32bit
version, but I decided to leave that out for now. Primarily interested
in comments. Wonder a bit whether it's worth providing an 'errorOk'
mode like scanint8 does, but surveying its callers suggests we should
rather change them to not need it...

Greetings,

Andres Freund

[1]: http://archives.postgresql.org/message-id/20171030112751.mukkriz2rur2qkxc%40alap3.anarazel.de

Attachments:

0001-Provide-overflow-safe-integer-math-inline-functions.patchtext/x-diff; charset=us-asciiDownload

From 98fbe53be0a3046f8ace687f846f91a0043deee8 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 29 Oct 2017 22:13:54 -0700
Subject: [PATCH 1/3] Provide overflow safe integer math inline functions.

Author: Andres Freund, with some code stolen from Greg Stark
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 config/c-compiler.m4          |  22 ++++
 configure                     |  33 ++++++
 configure.in                  |   4 +
 src/include/common/int.h      | 229 ++++++++++++++++++++++++++++++++++++++++++
 src/include/pg_config.h.in    |   3 +
 src/include/pg_config.h.win32 |   3 +
 6 files changed, 294 insertions(+)
 create mode 100644 src/include/common/int.h

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 6dcc7906491..0d91e52a28f 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -296,6 +296,28 @@ fi])# PGAC_C_BUILTIN_CONSTANT_P
 
 
 
+# PGAC_C_BUILTIN_OP_OVERFLOW
+# -------------------------
+# Check if the C compiler understands __builtin_$op_overflow(),
+# and define HAVE__BUILTIN_OP_OVERFLOW if so.
+#
+# Check for the most complicated case, 64 bit multiplication, as a
+# proxy for all of the operations.
+AC_DEFUN([PGAC_C_BUILTIN_OP_OVERFLOW],
+[AC_CACHE_CHECK(for __builtin_mul_overflow, pgac_cv__builtin_op_overflow,
+[AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
+[PG_INT64_TYPE result;
+__builtin_mul_overflow((PG_INT64_TYPE) 1, (PG_INT64_TYPE) 2, &result);]
+)],
+[pgac_cv__builtin_op_overflow=yes],
+[pgac_cv__builtin_op_overflow=no])])
+if test x"$pgac_cv__builtin_op_overflow" = xyes ; then
+AC_DEFINE(HAVE__BUILTIN_OP_OVERFLOW, 1,
+          [Define to 1 if your compiler understands __builtin_$op_overflow.])
+fi])# PGAC_C_BUILTIN_OP_OVERFLOW
+
+
+
 # PGAC_C_BUILTIN_UNREACHABLE
 # --------------------------
 # Check if the C compiler understands __builtin_unreachable(),
diff --git a/configure b/configure
index 4ecd2e19224..f66899488cc 100755
--- a/configure
+++ b/configure
@@ -14467,6 +14467,39 @@ esac
 
 fi
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_mul_overflow" >&5
+$as_echo_n "checking for __builtin_mul_overflow... " >&6; }
+if ${pgac_cv__builtin_op_overflow+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+PG_INT64_TYPE result;
+__builtin_mul_overflow((PG_INT64_TYPE) 1, (PG_INT64_TYPE) 2, &result);
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv__builtin_op_overflow=yes
+else
+  pgac_cv__builtin_op_overflow=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_op_overflow" >&5
+$as_echo "$pgac_cv__builtin_op_overflow" >&6; }
+if test x"$pgac_cv__builtin_op_overflow" = xyes ; then
+
+$as_echo "#define HAVE__BUILTIN_OP_OVERFLOW 1" >>confdefs.h
+
+fi
+
 # Check size of void *, size_t (enables tweaks for > 32bit address space)
 # The cast to long int works around a bug in the HP C Compiler
 # version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
diff --git a/configure.in b/configure.in
index cea7fd07553..edf1dd2e7b8 100644
--- a/configure.in
+++ b/configure.in
@@ -1764,6 +1764,10 @@ if test $pgac_need_repl_snprintf = yes; then
   AC_LIBOBJ(snprintf)
 fi
 
+# has to be down here, rather than with the other builtins, because
+# the test uses PG_INT64_TYPE.
+PGAC_C_BUILTIN_OP_OVERFLOW
+
 # Check size of void *, size_t (enables tweaks for > 32bit address space)
 AC_CHECK_SIZEOF([void *])
 AC_CHECK_SIZEOF([size_t])
diff --git a/src/include/common/int.h b/src/include/common/int.h
new file mode 100644
index 00000000000..648cbd49f14
--- /dev/null
+++ b/src/include/common/int.h
@@ -0,0 +1,229 @@
+/*-------------------------------------------------------------------------
+ *
+ * int.h
+ *	  Routines to perform integer math, while checking for overflows.
+ *
+ * The routines in this file are intended to be well defined C, without
+ * relying on compiler flags like -fwrapv.
+ *
+ * To reduce the overhead of these routines try to use compiler intrinsics
+ * where available. That's not that important for the 16, 32 bit cases, but
+ * the 64 bit cases can be considerably faster with intrinsics. In case no
+ * intrinsics are available 128 bit math is used where available.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * src/include/common/int.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COMMON_INT_H
+#define COMMON_INT_H
+
+/*
+ * If a + b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_add16_overflow(int16 a, int16 b, int16 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_add_overflow(a, b, result);
+#else
+	int32 res = (int32) a + (int32) b;
+	if (res > PG_INT16_MAX || res < PG_INT16_MIN)
+		return true;
+	*result = (int16) res;
+	return false;
+#endif
+}
+
+/*
+ * If a - b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_sub16_overflow(int16 a, int16 b, int16 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_sub_overflow(a, b, result);
+#else
+	int32 res = (int32) a - (int32) b;
+	if (res > PG_INT16_MAX || res < PG_INT16_MIN)
+		return true;
+	*result = (int16) res;
+	return false;
+#endif
+}
+
+/*
+ * If a * b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_mul16_overflow(int16 a, int16 b, int16 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_mul_overflow(a, b, result);
+#else
+	int32 res = (int32) a * (int32) b;
+	if (res > PG_INT16_MAX || res < PG_INT16_MIN)
+		return true;
+	*result = (int16) res;
+	return false;
+#endif
+}
+
+/*
+ * If a + b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_add32_overflow(int32 a, int32 b, int32 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_add_overflow(a, b, result);
+#else
+	int64 res = (int64) a + (int64) b;
+	if (res > PG_INT32_MAX || res < PG_INT32_MIN)
+		return true;
+	*result = (int32) res;
+	return false;
+#endif
+}
+
+/*
+ * If a - b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_sub32_overflow(int32 a, int32 b, int32 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_sub_overflow(a, b, result);
+#else
+	int64 res = (int64) a - (int64) b;
+	if (res > PG_INT32_MAX || res < PG_INT32_MIN)
+		return true;
+	*result = (int32) res;
+	return false;
+#endif
+}
+
+/*
+ * If a * b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_mul32_overflow(int32 a, int32 b, int32 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_mul_overflow(a, b, result);
+#else
+	int64 res = (int64) a * (int64) b;
+	if (res > PG_INT32_MAX || res < PG_INT32_MIN)
+		return true;
+	*result = (int32) res;
+	return false;
+#endif
+}
+
+/*
+ * If a + b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_add64_overflow(int64 a, int64 b, int64 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_add_overflow(a, b, result);
+#elif defined(HAVE_INT128)
+	int128 res = (int128) a + (int128) b;
+	if (res > PG_INT64_MAX || res < PG_INT64_MIN)
+		return true;
+	*result = (int64) res;
+	return false;
+#else
+	if ((a > 0 && b > 0 && a > PG_INT64_MAX - b) ||
+		(a < 0 && b < 0 && a < PG_INT64_MIN - b))
+		return true;
+	*result = a + b;
+	return false;
+#endif
+}
+
+/*
+ * If a - b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_sub64_overflow(int64 a, int64 b, int64 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_sub_overflow(a, b, result);
+#elif defined(HAVE_INT128)
+	int128 res = (int128) a - (int128) b;
+	if (res > PG_INT64_MAX || res < PG_INT64_MIN)
+		return true;
+	*result = (int64) res;
+	return false;
+#else
+	if ((a < 0 && b > 0 && a < PG_INT64_MIN + b) ||
+		(a > 0 && b < 0 && a > PG_INT64_MAX + b))
+		return true;
+	*result = a - b;
+	return false;
+#endif
+}
+
+/*
+ * If a * b overflows, return true, otherwise store the result of a + b into
+ * *result. The content of *result is implementation defined in case of
+ * overflow.
+ */
+static inline bool
+pg_mul64_overflow(int64 a, int64 b, int64 *result)
+{
+#if defined(HAVE__BUILTIN_OP_OVERFLOW)
+	return __builtin_mul_overflow(a, b, result);
+#elif defined(HAVE_INT128)
+	int128 res = (int128) a * (int128) b;
+	if (res > PG_INT64_MAX || res < PG_INT64_MIN)
+		return true;
+	*result = (int64) res;
+	return false;
+#else
+	/* Overflow can only happen if at least one value is outside the range
+	 * sqrt(min)..sqrt(max) so check that first as the division can be quite a bit
+	 * more expensive than the multiplication.
+	 *
+	 * Multiplying by 0 or 1 can't overflow of course and checking for 0
+	 * separately avoids any risk of dividing by 0.  Be careful about dividing
+	 * INT_MIN by -1 also, note reversing the a and b to ensure we're always
+	 * dividing it by a positive value.
+	 *
+	 */
+	if ((a > PG_INT32_MAX || a < PG_INT32_MIN  ||
+		 b > PG_INT32_MAX || b < PG_INT32_MIN) &&
+		a != 0 && a != 1 && b != 0 && b != 1 &&
+		((a > 0 && b > 0 && a > PG_INT64_MAX / b) ||
+		 (a > 0 && b < 0 && b < PG_INT64_MIN / a) ||
+		 (a < 0 && b > 0 && a < PG_INT64_MIN / b) ||
+		 (a < 0 && b < 0 && a < PG_INT64_MAX / b)))
+	{
+		return true;
+	}
+	*result = a * b;
+	return false;
+#endif
+}
+
+#endif /* COMMON_INT_H */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index cfdcc5ac62f..dab6d41f5e0 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -687,6 +687,9 @@
 /* Define to 1 if your compiler understands __builtin_constant_p. */
 #undef HAVE__BUILTIN_CONSTANT_P
 
+/* Define to 1 if your compiler understands __builtin_$op_overflow. */
+#undef HAVE__BUILTIN_OP_OVERFLOW
+
 /* Define to 1 if your compiler understands __builtin_types_compatible_p. */
 #undef HAVE__BUILTIN_TYPES_COMPATIBLE_P
 
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index ab9b941e89d..7dbf67ddf69 100644
--- a/src/include/pg_config.h.win32
+++ b/src/include/pg_config.h.win32
@@ -512,6 +512,9 @@
 /* Define to 1 if your compiler understands __builtin_constant_p. */
 /* #undef HAVE__BUILTIN_CONSTANT_P */
 
+/* Define to 1 if your compiler understands __builtin_$op_overflow. */
+/* #undef HAVE__BUILTIN_OP_OVERFLOW */
+
 /* Define to 1 if your compiler understands __builtin_types_compatible_p. */
 /* #undef HAVE__BUILTIN_TYPES_COMPATIBLE_P */
 
-- 
2.14.1.536.g6867272d5b.dirty

0001-Hand-code-string-to-int32-conversion-for-performance.patchtext/x-diff; charset=us-asciiDownload

From 64e24e2b8304619a305c8000b12d825e3b80aae5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 8 Dec 2017 13:31:15 -0800
Subject: [PATCH] Hand code string to int32 conversion for performance.

---
 src/backend/utils/adt/int.c      |  2 +-
 src/backend/utils/adt/numutils.c | 90 ++++++++++++++++++++++++++++++++++++++++
 src/include/utils/builtins.h     |  1 +
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 4cd8960b3fc..8af4f8f3f7a 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -267,7 +267,7 @@ int4in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
 
-	PG_RETURN_INT32(pg_atoi(num, sizeof(int32), '\0'));
+	PG_RETURN_INT32(pg_strto32(num));
 }
 
 /*
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 244904ea940..f2281f86dae 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -18,6 +18,7 @@
 #include <limits.h>
 #include <ctype.h>
 
+#include "common/int.h"
 #include "utils/builtins.h"
 
 /*
@@ -108,6 +109,95 @@ pg_atoi(const char *s, int size, int c)
 	return (int32) l;
 }
 
+
+/*
+ * Convert input string to a 32 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. This will
+ * throw ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int32
+pg_strto32(const char *s)
+{
+	const char *in = s;
+	int32		tmp = 0;
+	bool		neg = 0;
+
+
+	/* skip leading spaces */
+	while (likely(*in) && isspace((unsigned char) *in))
+		in++;
+
+	/* handle sign */
+	if (*in == '-')
+	{
+		in++;
+		neg = true;
+	}
+	else if (*in == '+')
+		in++;
+
+	/* require at least one digit */
+	if (unlikely(!isdigit((unsigned char) *in)))
+		goto err;
+
+	/* process digits */
+	while (true)
+	{
+		if (!*in)
+			goto out;
+		if (!isdigit((unsigned char) *in))
+			goto checkspace;
+
+		/* accumulate input */
+		if (unlikely(pg_mul32_overflow(tmp, 10, &tmp)) ||
+			unlikely(pg_sub32_overflow(tmp, *in - '0', &tmp)))
+			goto overflow;
+		in++;
+	}
+
+checkspace:
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*in != '\0' && isspace((unsigned char) *in))
+		in++;
+
+	if (unlikely(*in != '\0'))
+		goto err;
+
+out:
+	/*
+	 * Accumulated input as a negative number, so adjust if that's not what's
+	 * needed.
+	 */
+	if (!neg)
+	{
+		/* could fail if input is most negative number */
+		if (unlikely(tmp == PG_INT32_MIN))
+			goto overflow;
+
+		return -tmp;
+	}
+
+	return tmp;
+
+overflow:
+	ereport(ERROR,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value \"%s\" is out of range for type %s",
+					s, "integer")));
+
+err:
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for integer: \"%s\"",
+					s)));
+}
+
+
 /*
  * pg_itoa: converts a signed 16-bit integer to its string representation
  *
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 762532f6369..fa45c84b752 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -43,6 +43,7 @@ extern int	namestrcmp(Name name, const char *str);
 
 /* numutils.c */
 extern int32 pg_atoi(const char *s, int size, int c);
+extern int32 pg_strto32(const char *s);
 extern void pg_itoa(int16 i, char *a);
 extern void pg_ltoa(int32 l, char *a);
 extern void pg_lltoa(int64 ll, char *a);
-- 
2.14.1.536.g6867272d5b.dirty

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Andres Freund (#4)

1 attachment(s)

Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

Hi,

On 2017-12-08 13:44:37 -0800, Andres Freund wrote:

On 2017-12-08 10:17:34 -0800, Andres Freund wrote:

the strtoll is libc functionality triggered by pg_atoi(), something I've
seen show up in numerous profiles. I think it's probably time to have
our own optimized version of it rather than relying on libcs.

Attached is a hand-rolled version. After quickly hacking up one from
scratch, I noticed we already kind of have one for int64 (scanint8), so
I changed the structure of this one to be relatively similar.

It's currently using the overflow logic from [1], but that's not
fundamentally required, we could rely on fwrapv for this one too.

This one improves performance of the submitted workload from 1223.950ms
to 1020.640ms (best of three). The profile's shape changes quite
noticeably:

FWIW, here's a rebased version of this patch. Could probably be polished
further. One might argue that we should do a bit more wide ranging
changes, to convert scanint8 and pg_atoi to be also unified. But it
might also just be worthwhile to apply without those, given the
performance benefit.

Anybody have an opinion on that?

Greetings,

Andres Freund

Attachments:

v1-0001-Hand-code-string-to-integer-conversion-for-perfor.patchtext/x-diff; charset=us-asciiDownload

From a31bdd83fe02fc228263d099f5a4d2d4611970fc Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 8 Dec 2017 13:31:15 -0800
Subject: [PATCH v1] Hand code string to integer conversion for performance.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hmjpk@alap3.anarazel.de
Backpatch:
---
 contrib/dblink/expected/dblink.out            |   2 +-
 .../postgres_fdw/expected/postgres_fdw.out    |   8 +-
 contrib/spi/refint.c                          |   2 +-
 doc/src/sgml/sources.sgml                     |   2 +-
 src/backend/libpq/pqmq.c                      |   6 +-
 .../libpqwalreceiver/libpqwalreceiver.c       |   4 +-
 src/backend/tsearch/wparser_def.c             |   8 +-
 src/backend/utils/adt/arrayutils.c            |   3 +-
 src/backend/utils/adt/int.c                   |   4 +-
 src/backend/utils/adt/int8.c                  |   4 +-
 src/backend/utils/adt/numutils.c              | 175 ++++++++++++++++++
 src/backend/utils/adt/varlena.c               |   4 +-
 src/include/utils/builtins.h                  |   2 +
 .../expected/plpython_subtransaction.out      |   4 +-
 src/pl/plpython/expected/plpython_types.out   |   2 +-
 src/pl/tcl/expected/pltcl_subxact.out         |   6 +-
 src/test/regress/expected/aggregates.out      |   2 +-
 src/test/regress/expected/alter_table.out     |   2 +-
 src/test/regress/expected/copy2.out           |   2 +-
 src/test/regress/expected/int2.out            |  14 +-
 src/test/regress/expected/int4.out            |  14 +-
 src/test/regress/expected/int8.out            |  10 +-
 src/test/regress/expected/plpgsql.out         |   4 +-
 src/test/regress/expected/select_parallel.out |   2 +-
 src/test/regress/regress.c                    |   4 +-
 25 files changed, 233 insertions(+), 57 deletions(-)

diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index dfd49b937e8..6ceabb453c0 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -1154,7 +1154,7 @@ FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int);
 
 SELECT *
 FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int);
-ERROR:  invalid input syntax for integer: "not an int"
+ERROR:  invalid input syntax for type integer: "not an int"
 -- Make sure that the local settings have retained their values in spite
 -- of shenanigans on the connection.
 SHOW datestyle;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index cf4863c5aa2..c321a466114 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4087,16 +4087,16 @@ DROP FUNCTION f_test(int);
 -- ===================================================================
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
 SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  column "c8" of foreign table "ft1"
 SELECT  ft1.c1,  ft2.c2, ft1.c8 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  column "c8" of foreign table "ft1"
 SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  whole-row reference to foreign table "ft1"
 SELECT sum(c2), array_agg(c8) FROM ft1 GROUP BY c8; -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  processing expression at position 2 in select list
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum;
 -- ===================================================================
diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index b065ffa400d..f90f2bce0ea 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -306,7 +306,7 @@ check_foreign_key(PG_FUNCTION_ARGS)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: too short %d (< 5) list of arguments", nargs);
 
-	nrefs = pg_atoi(args[0], sizeof(int), 0);
+	nrefs = pg_strtoint32(args[0]);
 	if (nrefs < 1)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: %d (< 1) number of references specified", nrefs);
diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
index 8870ee938aa..b08919dc70f 100644
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@@ -709,7 +709,7 @@ BETTER: could not open file %s (I/O failure)
     not helpful information.  If the error text doesn't make as much sense
     without the function name, reword it.
 <programlisting>
-BAD:    pg_atoi: error in "z": cannot parse "z"
+BAD:    pg_strtoint32: error in "z": cannot parse "z"
 BETTER: invalid input syntax for integer: "z"
 </programlisting>
    </para>
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 201075dd477..4fbc6b5115d 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -286,10 +286,10 @@ pq_parse_errornotice(StringInfo msg, ErrorData *edata)
 				edata->hint = pstrdup(value);
 				break;
 			case PG_DIAG_STATEMENT_POSITION:
-				edata->cursorpos = pg_atoi(value, sizeof(int), '\0');
+				edata->cursorpos = pg_strtoint32(value);
 				break;
 			case PG_DIAG_INTERNAL_POSITION:
-				edata->internalpos = pg_atoi(value, sizeof(int), '\0');
+				edata->internalpos = pg_strtoint32(value);
 				break;
 			case PG_DIAG_INTERNAL_QUERY:
 				edata->internalquery = pstrdup(value);
@@ -316,7 +316,7 @@ pq_parse_errornotice(StringInfo msg, ErrorData *edata)
 				edata->filename = pstrdup(value);
 				break;
 			case PG_DIAG_SOURCE_LINE:
-				edata->lineno = pg_atoi(value, sizeof(int), '\0');
+				edata->lineno = pg_strtoint32(value);
 				break;
 			case PG_DIAG_SOURCE_FUNCTION:
 				edata->funcname = pstrdup(value);
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index bd489061602..1e1695ef4f4 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -345,7 +345,7 @@ libpqrcv_identify_system(WalReceiverConn *conn, TimeLineID *primary_tli,
 						   ntuples, nfields, 3, 1)));
 	}
 	primary_sysid = pstrdup(PQgetvalue(res, 0, 0));
-	*primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
+	*primary_tli = pg_strtoint32(PQgetvalue(res, 0, 1));
 	PQclear(res);
 
 	*server_version = PQserverVersion(conn->streamConn);
@@ -480,7 +480,7 @@ libpqrcv_endstreaming(WalReceiverConn *conn, TimeLineID *next_tli)
 		if (PQnfields(res) < 2 || PQntuples(res) != 1)
 			ereport(ERROR,
 					(errmsg("unexpected result set after end-of-streaming")));
-		*next_tli = pg_atoi(PQgetvalue(res, 0, 0), sizeof(uint32), 0);
+		*next_tli = pg_strtoint32(PQgetvalue(res, 0, 0));
 		PQclear(res);
 
 		/* the result set should be followed by CommandComplete */
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index f0c34419905..d7cd2e58398 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -2460,13 +2460,13 @@ prsd_headline(PG_FUNCTION_ARGS)
 		char	   *val = defGetString(defel);
 
 		if (pg_strcasecmp(defel->defname, "MaxWords") == 0)
-			max_words = pg_atoi(val, sizeof(int32), 0);
+			max_words = pg_strtoint32(val);
 		else if (pg_strcasecmp(defel->defname, "MinWords") == 0)
-			min_words = pg_atoi(val, sizeof(int32), 0);
+			min_words = pg_strtoint32(val);
 		else if (pg_strcasecmp(defel->defname, "ShortWord") == 0)
-			shortword = pg_atoi(val, sizeof(int32), 0);
+			shortword = pg_strtoint32(val);
 		else if (pg_strcasecmp(defel->defname, "MaxFragments") == 0)
-			max_fragments = pg_atoi(val, sizeof(int32), 0);
+			max_fragments = pg_strtoint32(val);
 		else if (pg_strcasecmp(defel->defname, "StartSel") == 0)
 			prs->startsel = pstrdup(val);
 		else if (pg_strcasecmp(defel->defname, "StopSel") == 0)
diff --git a/src/backend/utils/adt/arrayutils.c b/src/backend/utils/adt/arrayutils.c
index c0d719e98cc..5b98efe76bc 100644
--- a/src/backend/utils/adt/arrayutils.c
+++ b/src/backend/utils/adt/arrayutils.c
@@ -226,8 +226,7 @@ ArrayGetIntegerTypmods(ArrayType *arr, int *n)
 	result = (int32 *) palloc(*n * sizeof(int32));
 
 	for (i = 0; i < *n; i++)
-		result[i] = pg_atoi(DatumGetCString(elem_values[i]),
-							sizeof(int32), '\0');
+		result[i] = pg_strtoint32(DatumGetCString(elem_values[i]));
 
 	pfree(elem_values);
 
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 02783d8d6fe..8149dc1369b 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -60,7 +60,7 @@ int2in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
 
-	PG_RETURN_INT16(pg_atoi(num, sizeof(int16), '\0'));
+	PG_RETURN_INT16(pg_strtoint16(num));
 }
 
 /*
@@ -265,7 +265,7 @@ int4in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
 
-	PG_RETURN_INT32(pg_atoi(num, sizeof(int32), '\0'));
+	PG_RETURN_INT32(pg_strtoint32(num));
 }
 
 /*
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 96686ccb2c9..6f0f85358cb 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -122,8 +122,8 @@ invalid_syntax:
 	if (!errorOK)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for integer: \"%s\"",
-						str)));
+				 errmsg("invalid input syntax for type %s: \"%s\"",
+						"bigint", str)));
 	return false;
 }
 
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index b5439f497cc..4929d12b5e7 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -18,6 +18,7 @@
 #include <limits.h>
 #include <ctype.h>
 
+#include "common/int.h"
 #include "utils/builtins.h"
 
 /*
@@ -108,6 +109,180 @@ pg_atoi(const char *s, int size, int c)
 	return (int32) l;
 }
 
+
+/*
+ * Convert input string to a signed 32 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. Will throw
+ * ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int32
+pg_strtoint32(const char *s)
+{
+	const char *in = s;
+	int32		tmp = 0;
+	bool		neg = 0;
+
+	/* skip leading spaces */
+	while (likely(*in) && isspace((unsigned char) *in))
+		in++;
+
+	/* handle sign */
+	if (*in == '-')
+	{
+		in++;
+		neg = true;
+	}
+	else if (*in == '+')
+		in++;
+
+	/* require at least one digit */
+	if (unlikely(!isdigit((unsigned char) *in)))
+		goto err;
+
+	/* process digits */
+	while (true)
+	{
+		if (!*in)
+			goto out;
+		if (!isdigit((unsigned char) *in))
+			goto checkspace;
+
+		/* accumulate input */
+		if (unlikely(pg_mul_s32_overflow(tmp, 10, &tmp)) ||
+			unlikely(pg_sub_s32_overflow(tmp, *in - '0', &tmp)))
+			goto overflow;
+		in++;
+	}
+
+checkspace:
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*in != '\0' && isspace((unsigned char) *in))
+		in++;
+
+	if (unlikely(*in != '\0'))
+		goto err;
+
+out:
+	/*
+	 * Accumulated input as a negative number, so adjust if that's not what's
+	 * needed.
+	 */
+	if (!neg)
+	{
+		/* could fail if input is most negative number */
+		if (unlikely(tmp == PG_INT32_MIN))
+			goto overflow;
+
+		return -tmp;
+	}
+
+	return tmp;
+
+overflow:
+	ereport(ERROR,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value \"%s\" is out of range for type %s",
+					s, "integer")));
+
+err:
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"integer", s)));
+}
+
+
+/*
+ * Convert input string to a signed 16 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. Will throw
+ * ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int16
+pg_strtoint16(const char *s)
+{
+	const char *in = s;
+	int16		tmp = 0;
+	bool		neg = 0;
+
+	/* skip leading spaces */
+	while (likely(*in) && isspace((unsigned char) *in))
+		in++;
+
+	/* handle sign */
+	if (*in == '-')
+	{
+		in++;
+		neg = true;
+	}
+	else if (*in == '+')
+		in++;
+
+	/* require at least one digit */
+	if (unlikely(!isdigit((unsigned char) *in)))
+		goto err;
+
+	/* process digits */
+	while (true)
+	{
+		if (!*in)
+			goto out;
+		if (!isdigit((unsigned char) *in))
+			goto checkspace;
+
+		/* accumulate input */
+		if (unlikely(pg_mul_s16_overflow(tmp, 10, &tmp)) ||
+			unlikely(pg_sub_s16_overflow(tmp, *in - '0', &tmp)))
+			goto overflow;
+		in++;
+	}
+
+checkspace:
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*in != '\0' && isspace((unsigned char) *in))
+		in++;
+
+	if (unlikely(*in != '\0'))
+		goto err;
+
+out:
+	/*
+	 * Accumulated input as a negative number, so adjust if that's not what's
+	 * needed.
+	 */
+	if (!neg)
+	{
+		/* could fail if input is most negative number */
+		if (unlikely(tmp == PG_INT16_MIN))
+			goto overflow;
+
+		return -tmp;
+	}
+
+	return tmp;
+
+overflow:
+	ereport(ERROR,
+			(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+			 errmsg("value \"%s\" is out of range for type %s",
+					s, "smallint")));
+
+err:
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for type %s: \"%s\"",
+					"smallint", s)));
+}
+
 /*
  * pg_itoa: converts a signed 16-bit integer to its string representation
  *
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index e8500b274dc..31eaa92c3b7 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5155,8 +5155,8 @@ text_format(PG_FUNCTION_ARGS)
 
 				str = OutputFunctionCall(&typoutputinfo_width, value);
 
-				/* pg_atoi will complain about bad data or overflow */
-				width = pg_atoi(str, sizeof(int), '\0');
+				/* pg_strtoint32 will complain about bad data or overflow */
+				width = pg_strtoint32(str);
 
 				pfree(str);
 			}
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index d0416e90fcc..88a42b345c1 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -43,6 +43,8 @@ extern int	namestrcmp(Name name, const char *str);
 
 /* numutils.c */
 extern int32 pg_atoi(const char *s, int size, int c);
+extern int16 pg_strtoint16(const char *s);
+extern int32 pg_strtoint32(const char *s);
 extern void pg_itoa(int16 i, char *a);
 extern void pg_ltoa(int32 l, char *a);
 extern void pg_lltoa(int64 ll, char *a);
diff --git a/src/pl/plpython/expected/plpython_subtransaction.out b/src/pl/plpython/expected/plpython_subtransaction.out
index b38cde8d2db..069f0992abd 100644
--- a/src/pl/plpython/expected/plpython_subtransaction.out
+++ b/src/pl/plpython/expected/plpython_subtransaction.out
@@ -43,7 +43,7 @@ SELECT * FROM subtransaction_tbl;
 
 TRUNCATE subtransaction_tbl;
 SELECT subtransaction_test('SPI');
-ERROR:  spiexceptions.InvalidTextRepresentation: invalid input syntax for integer: "oops"
+ERROR:  spiexceptions.InvalidTextRepresentation: invalid input syntax for type integer: "oops"
 LINE 1: INSERT INTO subtransaction_tbl VALUES ('oops')
                                                ^
 QUERY:  INSERT INTO subtransaction_tbl VALUES ('oops')
@@ -95,7 +95,7 @@ SELECT * FROM subtransaction_tbl;
 
 TRUNCATE subtransaction_tbl;
 SELECT subtransaction_ctx_test('SPI');
-ERROR:  spiexceptions.InvalidTextRepresentation: invalid input syntax for integer: "oops"
+ERROR:  spiexceptions.InvalidTextRepresentation: invalid input syntax for type integer: "oops"
 LINE 1: INSERT INTO subtransaction_tbl VALUES ('oops')
                                                ^
 QUERY:  INSERT INTO subtransaction_tbl VALUES ('oops')
diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
index eda965a9e0d..98b89b7d5c1 100644
--- a/src/pl/plpython/expected/plpython_types.out
+++ b/src/pl/plpython/expected/plpython_types.out
@@ -684,7 +684,7 @@ CREATE FUNCTION test_type_conversion_array_mixed2() RETURNS int[] AS $$
 return [123, 'abc']
 $$ LANGUAGE plpythonu;
 SELECT * FROM test_type_conversion_array_mixed2();
-ERROR:  invalid input syntax for integer: "abc"
+ERROR:  invalid input syntax for type integer: "abc"
 CONTEXT:  while creating return value
 PL/Python function "test_type_conversion_array_mixed2"
 CREATE FUNCTION test_type_conversion_mdarray_malformed() RETURNS int[] AS $$
diff --git a/src/pl/tcl/expected/pltcl_subxact.out b/src/pl/tcl/expected/pltcl_subxact.out
index 4393f4acf69..5e19bbbc636 100644
--- a/src/pl/tcl/expected/pltcl_subxact.out
+++ b/src/pl/tcl/expected/pltcl_subxact.out
@@ -71,9 +71,9 @@ SELECT * FROM subtransaction_tbl;
 
 TRUNCATE subtransaction_tbl;
 SELECT pltcl_wrapper('SELECT subtransaction_ctx_test(''SPI'')');
-                  pltcl_wrapper                  
--------------------------------------------------
- ERROR: invalid input syntax for integer: "oops"
+                    pltcl_wrapper                     
+------------------------------------------------------
+ ERROR: invalid input syntax for type integer: "oops"
 (1 row)
 
 SELECT * FROM subtransaction_tbl;
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index a120dd83f7b..5e216227542 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1674,7 +1674,7 @@ LINE 1: select rank(3) within group (order by stringu1,stringu2) fro...
                ^
 HINT:  To use the hypothetical-set aggregate rank, the number of hypothetical direct arguments (here 1) must match the number of ordering columns (here 2).
 select rank('fred') within group (order by x) from generate_series(1,5) x;
-ERROR:  invalid input syntax for integer: "fred"
+ERROR:  invalid input syntax for type integer: "fred"
 LINE 1: select rank('fred') within group (order by x) from generate_...
                     ^
 select rank('adam'::text collate "C") within group (order by x collate "POSIX")
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index df604a326ca..6c0438cec2f 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -1113,7 +1113,7 @@ select * from def_test;
 
 -- set defaults to an incorrect type: this should fail
 alter table def_test alter column c1 set default 'wrong_datatype';
-ERROR:  invalid input syntax for integer: "wrong_datatype"
+ERROR:  invalid input syntax for type integer: "wrong_datatype"
 alter table def_test alter column c2 set default 20;
 -- set defaults on a non-existent column: this should fail
 alter table def_test alter column c3 set default 30;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e606a5fda47..eb9e4b97741 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -33,7 +33,7 @@ COPY x (a, b, c, d, e, d, c) from stdin;
 ERROR:  column "d" specified more than once
 -- missing data: should fail
 COPY x from stdin;
-ERROR:  invalid input syntax for integer: ""
+ERROR:  invalid input syntax for type integer: ""
 CONTEXT:  COPY x, line 1, column a: ""
 COPY x from stdin;
 ERROR:  missing data for column "e"
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 3ea4ed93a0a..8c255b9e4dd 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -6,7 +6,7 @@ INSERT INTO INT2_TBL(f1) VALUES ('0   ');
 INSERT INTO INT2_TBL(f1) VALUES ('  1234 ');
 INSERT INTO INT2_TBL(f1) VALUES ('    -1234');
 INSERT INTO INT2_TBL(f1) VALUES ('34.5');
-ERROR:  invalid input syntax for integer: "34.5"
+ERROR:  invalid input syntax for type smallint: "34.5"
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('34.5');
                                          ^
 -- largest and smallest values
@@ -18,27 +18,27 @@ ERROR:  value "100000" is out of range for type smallint
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('100000');
                                          ^
 INSERT INTO INT2_TBL(f1) VALUES ('asdf');
-ERROR:  invalid input syntax for integer: "asdf"
+ERROR:  invalid input syntax for type smallint: "asdf"
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('asdf');
                                          ^
 INSERT INTO INT2_TBL(f1) VALUES ('    ');
-ERROR:  invalid input syntax for integer: "    "
+ERROR:  invalid input syntax for type smallint: "    "
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('    ');
                                          ^
 INSERT INTO INT2_TBL(f1) VALUES ('- 1234');
-ERROR:  invalid input syntax for integer: "- 1234"
+ERROR:  invalid input syntax for type smallint: "- 1234"
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('- 1234');
                                          ^
 INSERT INTO INT2_TBL(f1) VALUES ('4 444');
-ERROR:  invalid input syntax for integer: "4 444"
+ERROR:  invalid input syntax for type smallint: "4 444"
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('4 444');
                                          ^
 INSERT INTO INT2_TBL(f1) VALUES ('123 dt');
-ERROR:  invalid input syntax for integer: "123 dt"
+ERROR:  invalid input syntax for type smallint: "123 dt"
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('123 dt');
                                          ^
 INSERT INTO INT2_TBL(f1) VALUES ('');
-ERROR:  invalid input syntax for integer: ""
+ERROR:  invalid input syntax for type smallint: ""
 LINE 1: INSERT INTO INT2_TBL(f1) VALUES ('');
                                          ^
 SELECT '' AS five, * FROM INT2_TBL;
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 372fd4d94c8..bda7a8daefc 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -6,7 +6,7 @@ INSERT INTO INT4_TBL(f1) VALUES ('   0  ');
 INSERT INTO INT4_TBL(f1) VALUES ('123456     ');
 INSERT INTO INT4_TBL(f1) VALUES ('    -123456');
 INSERT INTO INT4_TBL(f1) VALUES ('34.5');
-ERROR:  invalid input syntax for integer: "34.5"
+ERROR:  invalid input syntax for type integer: "34.5"
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('34.5');
                                          ^
 -- largest and smallest values
@@ -18,27 +18,27 @@ ERROR:  value "1000000000000" is out of range for type integer
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('1000000000000');
                                          ^
 INSERT INTO INT4_TBL(f1) VALUES ('asdf');
-ERROR:  invalid input syntax for integer: "asdf"
+ERROR:  invalid input syntax for type integer: "asdf"
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('asdf');
                                          ^
 INSERT INTO INT4_TBL(f1) VALUES ('     ');
-ERROR:  invalid input syntax for integer: "     "
+ERROR:  invalid input syntax for type integer: "     "
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('     ');
                                          ^
 INSERT INTO INT4_TBL(f1) VALUES ('   asdf   ');
-ERROR:  invalid input syntax for integer: "   asdf   "
+ERROR:  invalid input syntax for type integer: "   asdf   "
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('   asdf   ');
                                          ^
 INSERT INTO INT4_TBL(f1) VALUES ('- 1234');
-ERROR:  invalid input syntax for integer: "- 1234"
+ERROR:  invalid input syntax for type integer: "- 1234"
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('- 1234');
                                          ^
 INSERT INTO INT4_TBL(f1) VALUES ('123       5');
-ERROR:  invalid input syntax for integer: "123       5"
+ERROR:  invalid input syntax for type integer: "123       5"
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('123       5');
                                          ^
 INSERT INTO INT4_TBL(f1) VALUES ('');
-ERROR:  invalid input syntax for integer: ""
+ERROR:  invalid input syntax for type integer: ""
 LINE 1: INSERT INTO INT4_TBL(f1) VALUES ('');
                                          ^
 SELECT '' AS five, * FROM INT4_TBL;
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index ed0bd34221e..35e3b3ff818 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -10,11 +10,11 @@ INSERT INTO INT8_TBL VALUES(+4567890123456789,'4567890123456789');
 INSERT INTO INT8_TBL VALUES('+4567890123456789','-4567890123456789');
 -- bad inputs
 INSERT INTO INT8_TBL(q1) VALUES ('      ');
-ERROR:  invalid input syntax for integer: "      "
+ERROR:  invalid input syntax for type bigint: "      "
 LINE 1: INSERT INTO INT8_TBL(q1) VALUES ('      ');
                                          ^
 INSERT INTO INT8_TBL(q1) VALUES ('xxx');
-ERROR:  invalid input syntax for integer: "xxx"
+ERROR:  invalid input syntax for type bigint: "xxx"
 LINE 1: INSERT INTO INT8_TBL(q1) VALUES ('xxx');
                                          ^
 INSERT INTO INT8_TBL(q1) VALUES ('3908203590239580293850293850329485');
@@ -26,15 +26,15 @@ ERROR:  value "-1204982019841029840928340329840934" is out of range for type big
 LINE 1: INSERT INTO INT8_TBL(q1) VALUES ('-1204982019841029840928340...
                                          ^
 INSERT INTO INT8_TBL(q1) VALUES ('- 123');
-ERROR:  invalid input syntax for integer: "- 123"
+ERROR:  invalid input syntax for type bigint: "- 123"
 LINE 1: INSERT INTO INT8_TBL(q1) VALUES ('- 123');
                                          ^
 INSERT INTO INT8_TBL(q1) VALUES ('  345     5');
-ERROR:  invalid input syntax for integer: "  345     5"
+ERROR:  invalid input syntax for type bigint: "  345     5"
 LINE 1: INSERT INTO INT8_TBL(q1) VALUES ('  345     5');
                                          ^
 INSERT INTO INT8_TBL(q1) VALUES ('');
-ERROR:  invalid input syntax for integer: ""
+ERROR:  invalid input syntax for type bigint: ""
 LINE 1: INSERT INTO INT8_TBL(q1) VALUES ('');
                                          ^
 SELECT * FROM INT8_TBL;
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index b687fbfddcc..dde2cc4bd09 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -3782,7 +3782,7 @@ begin
 end;
 $$ language plpgsql;
 select compos();
-ERROR:  invalid input syntax for integer: "(1,hello)"
+ERROR:  invalid input syntax for type integer: "(1,hello)"
 CONTEXT:  PL/pgSQL function compos() while casting return value to function's return type
 -- test: invalid use of composite expression in scalar-returning function
 create or replace function compos() returns int as $$
@@ -3791,7 +3791,7 @@ begin
 end;
 $$ language plpgsql;
 select compos();
-ERROR:  invalid input syntax for integer: "(1,hello)"
+ERROR:  invalid input syntax for type integer: "(1,hello)"
 CONTEXT:  PL/pgSQL function compos() while casting return value to function's return type
 drop function compos();
 drop type compostype;
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index cd0b94502d8..f1b8cd43376 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -975,7 +975,7 @@ ROLLBACK TO SAVEPOINT settings;
 SAVEPOINT settings;
 SET LOCAL force_parallel_mode = 1;
 select stringu1::int2 from tenk1 where unique1 = 1;
-ERROR:  invalid input syntax for integer: "BAAAAA"
+ERROR:  invalid input syntax for type smallint: "BAAAAA"
 CONTEXT:  parallel worker
 ROLLBACK TO SAVEPOINT settings;
 -- test interaction with set-returning functions
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 7060b6fbf32..aa224e5dc3e 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -149,8 +149,8 @@ widget_in(PG_FUNCTION_ARGS)
 	if (i < NARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid input syntax for type widget: \"%s\"",
-						str)));
+				 errmsg("invalid input syntax for type %s: \"%s\"",
+						"widget", str)));
 
 	result = (WIDGET *) palloc(sizeof(WIDGET));
 	result->center.x = atof(coord[0]);
-- 
2.18.0.rc2.dirty

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Andres Freund (#5)

Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund <andres@anarazel.de> wrote:

FWIW, here's a rebased version of this patch. Could probably be polished
further. One might argue that we should do a bit more wide ranging
changes, to convert scanint8 and pg_atoi to be also unified. But it
might also just be worthwhile to apply without those, given the
performance benefit.

Wouldn't hurt to do that one too, but might be OK to just do this
much. Questions:

1. Why the error message changes? If there's a good reason, it should
be done as a separate commit, or at least well-documented in the
commit message.

2. Does the likely/unlikely stuff make a noticeable difference?

3. If this is a drop-in replacement for pg_atoi, why not just recode
pg_atoi this way -- or have it call this -- and leave the callers
unchanged?

4. Are we sure this is faster on all platforms, or could it work out
the other way on, say, BSD?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Robert Haas (#6)

Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

Hi,

On 2018-07-18 14:34:34 -0400, Robert Haas wrote:

On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund <andres@anarazel.de> wrote:

FWIW, here's a rebased version of this patch. Could probably be polished
further. One might argue that we should do a bit more wide ranging
changes, to convert scanint8 and pg_atoi to be also unified. But it
might also just be worthwhile to apply without those, given the
performance benefit.

Wouldn't hurt to do that one too, but might be OK to just do this
much. Questions:

1. Why the error message changes? If there's a good reason, it should
be done as a separate commit, or at least well-documented in the
commit message.

Because there's a lot of "invalid input syntax for type %s: \"%s\"",
error messages, and we shouldn't force translators to have separate
version that inlines the first %s. But you're right, it'd be worthwhile
to point that out in the commit message.

2. Does the likely/unlikely stuff make a noticeable difference?

Yes. It's also largely a copy from existing code (scanint8), so I don't
really want to differ here.

3. If this is a drop-in replacement for pg_atoi, why not just recode
pg_atoi this way -- or have it call this -- and leave the callers
unchanged?

Because pg_atoi supports a variable 'terminator'. Supporting that would
create a bit slower code, without being particularly useful. I think
there's only a single in-core caller left after the patch
(int2vectorin). There's a fair argument that that should just be
open-coded to handle the weird space parsing, but given there's probably
external pg_atoi() callers, I'm not sure it's worth doing so?

I don't think it's a good idea to continue to have pg_atoi as a wrapper
- it takes a size argument, which makes efficient code hard.

4. Are we sure this is faster on all platforms, or could it work out
the other way on, say, BSD?

I'd be *VERY* surprised if any would be faster. It's not easy to write a
faster implmentation, than what I've proposed, and especially not so if
you use strtol() as the API (variable bases, a bit of locale support).

Greetings,

Andres Freund

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Andres Freund (#7)

Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

On Thu, Jul 19, 2018 at 4:32 PM, Andres Freund <andres@anarazel.de> wrote:

1. Why the error message changes? If there's a good reason, it should
be done as a separate commit, or at least well-documented in the
commit message.

Because there's a lot of "invalid input syntax for type %s: \"%s\"",
error messages, and we shouldn't force translators to have separate
version that inlines the first %s. But you're right, it'd be worthwhile
to point that out in the commit message.

It just seems weird that they're bundled together in one commit like this.

2. Does the likely/unlikely stuff make a noticeable difference?

Yes. It's also largely a copy from existing code (scanint8), so I don't
really want to differ here.

OK.

3. If this is a drop-in replacement for pg_atoi, why not just recode
pg_atoi this way -- or have it call this -- and leave the callers
unchanged?

Because pg_atoi supports a variable 'terminator'.

OK.

4. Are we sure this is faster on all platforms, or could it work out
the other way on, say, BSD?

I'd be *VERY* surprised if any would be faster. It's not easy to write a
faster implmentation, than what I've proposed, and especially not so if
you use strtol() as the API (variable bases, a bit of locale support).

OK.

Nothing else from me...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Robert Haas (#8)

Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

Hi,

On 2018-07-20 08:27:34 -0400, Robert Haas wrote:

On Thu, Jul 19, 2018 at 4:32 PM, Andres Freund <andres@anarazel.de> wrote:

1. Why the error message changes? If there's a good reason, it should
be done as a separate commit, or at least well-documented in the
commit message.

Because there's a lot of "invalid input syntax for type %s: \"%s\"",
error messages, and we shouldn't force translators to have separate
version that inlines the first %s. But you're right, it'd be worthwhile
to point that out in the commit message.

It just seems weird that they're bundled together in one commit like this.

I'll push it separately.

Nothing else from me...

Thanks for looking!

Greetings,

Andres Freund