Optimizing COPY with SIMD

Started by Neil Conwayover 1 year ago7 messages

neil.conway@gmail.com

over 1 year ago

5 attachment(s)

Inspired by David Rowley's work [1]/messages/by-id/CAApHDvpLXwMZvbCKcdGfU9XQjGCDm7tFpRdTXuB9PVgpNUYfEQ@mail.gmail.com on optimizing JSON escape processing
with SIMD, I noticed that the COPY code could potentially benefit from SIMD
instructions in a few places, eg:

(1) CopyAttributeOutCSV() has 2 byte-by-byte loops
(2) CopyAttributeOutText() has 1
(3) CopyReadLineText() has 1
(4) CopyReadAttributesCSV() has 1
(5) CopyReadAttributesText() has 1

Attached is a quick POC patch that uses SIMD instructions for case (1)
above. For sufficiently large attribute values, this is a significant
performance win. For small fields, performance looks to be about the same.
Results on an M1 Macbook Pro.

======
neilconway=# select count(*), avg(length(a))::int, avg(length(b))::int,
avg(length(c))::int from short_strings;
count | avg | avg | avg
--------+-----+-----+-----
524288 | 8 | 8 | 8
(1 row)

neilconway=# select count(*), avg(length(a))::int, avg(length(b))::int,
avg(length(c))::int from long_strings;
count | avg | avg | avg
-------+-----+-----+-----
65536 | 657 | 657 | 657
(1 row)

master @ 8fea1bd541:

$ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
Time (mean ± σ): 2.027 s ± 0.075 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.928 s … 2.207 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
Time (mean ± σ): 1.420 s ± 0.027 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.379 s … 1.473 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
Time (mean ± σ): 546.0 ms ± 9.6 ms [User: 1.4 ms, System: 0.3 ms]
Range (min … max): 539.0 ms … 572.1 ms 10 runs

master + SIMD patch:

$ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
Time (mean ± σ): 797.8 ms ± 19.4 ms [User: 0.9 ms, System: 0.0 ms]
Range (min … max): 770.0 ms … 828.5 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
Time (mean ± σ): 732.3 ms ± 20.8 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 701.1 ms … 763.5 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
Time (mean ± σ): 545.7 ms ± 13.5 ms [User: 1.3 ms, System: 0.1 ms]
Range (min … max): 533.6 ms … 580.2 ms 10 runs
======

Implementation-wise, it seems complex to use SIMD when
encoding_embeds_ascii is true (which should be uncommon). In principle, we
could probably still use SIMD here, but it would require juggling between
the SIMD chunk size and sizes returned by pg_encoding_mblen(). For now, the
POC patch falls back to the old code path when encoding_embeds_ascii is
true.

Any feedback would be very welcome.

Cheers,
Neil

[1]: /messages/by-id/CAApHDvpLXwMZvbCKcdGfU9XQjGCDm7tFpRdTXuB9PVgpNUYfEQ@mail.gmail.com
/messages/by-id/CAApHDvpLXwMZvbCKcdGfU9XQjGCDm7tFpRdTXuB9PVgpNUYfEQ@mail.gmail.com

Attachments:

0002-Optimize-COPY-TO-.-FORMAT-CSV-using-SIMD-instruction.patchapplication/octet-stream; name=0002-Optimize-COPY-TO-.-FORMAT-CSV-using-SIMD-instruction.patchDownload

From ba4e26a0086dc5b49405a5ac8b495a64f1c2bb49 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 14:00:58 -0400
Subject: [PATCH 2/2] Optimize COPY TO ... FORMAT CSV using SIMD instructions

CopyAttributeOutCSV() does one or two byte-by-byte loops over the text of each
attribute, depending on whether quotation is required. Implementing this loops
using SIMD yields a significant speedup for long attribute values. For short
attribute values, performance is roughly unchanged.

Using SIMD when encoding_embeds_ascii is true seems quite complex, so for now we
just use the old code path for such encodings.
---
 src/backend/commands/copyto.c | 145 +++++++++++++++++++++++++++++++++-
 1 file changed, 142 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 48957d8c3e..a62c4ec120 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -29,6 +29,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -1124,6 +1125,137 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	DUMPSOFAR();
 }
 
+/*
+ * Send text representation of one attribute, with conversion and CSV-style
+ * escaping. This variant uses SIMD instructions to optimize processing, but
+ * we can only use this approach when encoding_embeds_ascii if false.
+ */
+static void
+CopyAttributeOutCSVFast(CopyToState cstate, const char *ptr,
+						bool use_quote)
+{
+	int			len;
+	int			vlen;
+	char		delimc = cstate->opts.delim[0];
+	char		quotec = cstate->opts.quote[0];
+	char		escapec = cstate->opts.escape[0];
+
+	len = strlen(ptr);
+	vlen = len & (int) (~(sizeof(Vector8) - 1));
+
+	/*
+	 * Make a preliminary pass to discover if it needs quoting
+	 */
+	if (!use_quote)
+	{
+		bool	single_attr = (list_length(cstate->attnumlist) == 1);
+
+		/*
+		 * Because '\.' can be a data value, quote it if it appears alone on a
+		 * line so it is not interpreted as the end-of-data marker.
+		 */
+		if (single_attr && strcmp(ptr, "\\.") == 0)
+			use_quote = true;
+		else
+		{
+			int		i;
+			Vector8 chunk;
+
+			for (i = 0; i < vlen; i += sizeof(Vector8))
+			{
+				vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+				if (vector8_has(chunk, (unsigned char) delimc) ||
+					vector8_has(chunk, (unsigned char) quotec) ||
+					vector8_has(chunk, (unsigned char) '\n') ||
+					vector8_has(chunk, (unsigned char) '\r'))
+				{
+					use_quote = true;
+					break;
+				}
+			}
+
+			/* Check the tail of the string */
+			if (!use_quote)
+			{
+				for (; i < len; i++)
+				{
+					char c = ptr[i];
+
+					if (c == delimc || c == quotec || c == '\n' || c == '\r')
+					{
+						use_quote = true;
+						break;
+					}
+				}
+			}
+		}
+	}
+
+	if (use_quote)
+	{
+		int		i;
+		int		start_idx = 0;
+		Vector8 chunk;
+
+		CopySendChar(cstate, quotec);
+
+		for (i = 0; i < vlen; i += sizeof(Vector8))
+		{
+			vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+			if (vector8_has(chunk, (unsigned char) delimc) ||
+				vector8_has(chunk, (unsigned char) quotec))
+			{
+				/*
+				 * This chunk has one or more characters that require
+				 * escaping, so switch to byte-at-a-time processing
+				 */
+				for (int j = i; j < (i + sizeof(Vector8)); j++)
+				{
+					char c = ptr[j];
+
+					if (c == quotec || c == escapec)
+					{
+						if (j > start_idx)
+							CopySendData(cstate, ptr + start_idx, j - start_idx);
+
+						CopySendChar(cstate, escapec);
+						start_idx = j;
+					}
+				}
+			}
+		}
+
+		/* Process the tail of the string */
+		for (; i < len; i++)
+		{
+			char c = ptr[i];
+
+			if (c == quotec || c == escapec)
+			{
+				if (i > start_idx)
+					CopySendData(cstate, ptr + start_idx, i - start_idx);
+
+				CopySendChar(cstate, escapec);
+				start_idx = i;
+			}
+		}
+
+		/* Send any remaining text */
+		if (start_idx < len)
+			CopySendData(cstate, ptr + start_idx, len - start_idx);
+
+		CopySendChar(cstate, quotec);
+	}
+	else
+	{
+		/* If it doesn't need quoting, we can just dump it as-is */
+		CopySendData(cstate, ptr, len);
+	}
+}
+
+
 /*
  * Send text representation of one attribute, with conversion and
  * CSV-style escaping
@@ -1138,7 +1270,6 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	char		delimc = cstate->opts.delim[0];
 	char		quotec = cstate->opts.quote[0];
 	char		escapec = cstate->opts.escape[0];
-	bool		single_attr = (list_length(cstate->attnumlist) == 1);
 
 	/* force quoting if it matches null_print (before conversion!) */
 	if (!use_quote && strcmp(string, cstate->opts.null_print) == 0)
@@ -1149,11 +1280,19 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	else
 		ptr = string;
 
+	if (!cstate->encoding_embeds_ascii)
+	{
+		CopyAttributeOutCSVFast(cstate, ptr, use_quote);
+		return;
+	}
+
 	/*
 	 * Make a preliminary pass to discover if it needs quoting
 	 */
 	if (!use_quote)
 	{
+		bool	single_attr = (list_length(cstate->attnumlist) == 1);
+
 		/*
 		 * Because '\.' can be a data value, quote it if it appears alone on a
 		 * line so it is not interpreted as the end-of-data marker.
@@ -1171,7 +1310,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 					use_quote = true;
 					break;
 				}
-				if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+				if (IS_HIGHBIT_SET(c))
 					tptr += pg_encoding_mblen(cstate->file_encoding, tptr);
 				else
 					tptr++;
@@ -1195,7 +1334,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 				CopySendChar(cstate, escapec);
 				start = ptr;	/* we include char in next run */
 			}
-			if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+			if (IS_HIGHBIT_SET(c))
 				ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
 			else
 				ptr++;
-- 
2.39.3 (Apple Git-146)

0001-Remove-inaccurate-comment.patchapplication/octet-stream; name=0001-Remove-inaccurate-comment.patchDownload

From 3981f7507fd828569e5ba4cec681c0c48e6db670 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 12:01:48 -0400
Subject: [PATCH 1/2] Remove inaccurate comment

---
 src/backend/commands/copyto.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36d7..48957d8c3e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -969,9 +969,6 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldcontext);
 }
 
-/*
- * Send text representation of one attribute, with conversion and escaping
- */
 #define DUMPSOFAR() \
 	do { \
 		if (ptr > start) \
-- 
2.39.3 (Apple Git-146)

copy-out-bench-short.sqlapplication/octet-stream; name=copy-out-bench-short.sqlDownload

copy-out-bench-long.sqlapplication/octet-stream; name=copy-out-bench-long.sqlDownload

copy-out-bench-long-quotes.sqlapplication/octet-stream; name=copy-out-bench-long-quotes.sqlDownload

Joe Conway

mail@joeconway.com

over 1 year ago

In reply to: Neil Conway (#1)

Re: Optimizing COPY with SIMD

On 6/2/24 15:17, Neil Conway wrote:

Inspired by David Rowley's work [1]

Welcome back!

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Neil Conway

neil.conway@gmail.com

over 1 year ago

In reply to: Joe Conway (#2)

Re: Optimizing COPY with SIMD

On Mon, Jun 3, 2024 at 9:22 AM Joe Conway <mail@joeconway.com> wrote:

Welcome back!

Thanks Joe! It's been a minute :)

Neil

Nathan Bossart

nathandbossart@gmail.com

over 1 year ago

In reply to: Neil Conway (#1)

Re: Optimizing COPY with SIMD

On Sun, Jun 02, 2024 at 03:17:21PM -0400, Neil Conway wrote:

master @ 8fea1bd541:

$ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
Time (mean ± σ): 2.027 s ± 0.075 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.928 s … 2.207 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
Time (mean ± σ): 1.420 s ± 0.027 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.379 s … 1.473 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
Time (mean ± σ): 546.0 ms ± 9.6 ms [User: 1.4 ms, System: 0.3 ms]
Range (min … max): 539.0 ms … 572.1 ms 10 runs

master + SIMD patch:

$ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
Time (mean ± σ): 797.8 ms ± 19.4 ms [User: 0.9 ms, System: 0.0 ms]
Range (min … max): 770.0 ms … 828.5 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
Time (mean ± σ): 732.3 ms ± 20.8 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 701.1 ms … 763.5 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
Time (mean ± σ): 545.7 ms ± 13.5 ms [User: 1.3 ms, System: 0.1 ms]
Range (min … max): 533.6 ms … 580.2 ms 10 runs

These are nice results.

-/*
- * Send text representation of one attribute, with conversion and escaping
- */
#define DUMPSOFAR() \

IIUC this comment was meant to describe the CopyAttributeOutText() function
just below this macro. When the macro was added in commit 0a5fdb0 from
2006, the comment became detached from the function. Maybe we should just
move it back down below the macro.

+/*
+ * Send text representation of one attribute, with conversion and CSV-style
+ * escaping. This variant uses SIMD instructions to optimize processing, but
+ * we can only use this approach when encoding_embeds_ascii if false.
+ */

nitpick: Can we add a few words about why using SIMD instructions when
encoding_embeds_ascii is true is difficult? I don't dispute that it is
complex and/or not worth the effort, but it's not clear to me why that's
the case just from reading the patch.

+static void
+CopyAttributeOutCSVFast(CopyToState cstate, const char *ptr,
+						bool use_quote)

nitpick: Can we add "vector" or "simd" to the name instead of "fast"? IMHO
it's better to be more descriptive.

At a glance, the code look pretty reasonable to me. I might have some
other nitpicks, such as styling tricks to avoid too many levels of
indentation, but that's not terribly important.

--
nathan

Neil Conway

neil.conway@gmail.com

over 1 year ago

In reply to: Nathan Bossart (#4)

4 attachment(s)

Re: Optimizing COPY with SIMD

Thanks for the review and feedback!

On Mon, Jun 3, 2024 at 10:56 AM Nathan Bossart <nathandbossart@gmail.com>
wrote:

-/*
- * Send text representation of one attribute, with conversion and

escaping

- */
#define DUMPSOFAR() \

IIUC this comment was meant to describe the CopyAttributeOutText() function
just below this macro. When the macro was added in commit 0a5fdb0 from
2006, the comment became detached from the function. Maybe we should just
move it back down below the macro.

Ah, that makes sense -- done.

+/*
+ * Send text representation of one attribute, with conversion and
CSV-style

+ * escaping. This variant uses SIMD instructions to optimize

processing, but
+ * we can only use this approach when encoding_embeds_ascii if false.
+ */
nitpick: Can we add a few words about why using SIMD instructions when
encoding_embeds_ascii is true is difficult? I don't dispute that it is
complex and/or not worth the effort, but it's not clear to me why that's
the case just from reading the patch.

Sounds good.

+static void
+CopyAttributeOutCSVFast(CopyToState cstate, const char *ptr,
+                                             bool use_quote)
nitpick: Can we add "vector" or "simd" to the name instead of "fast"? IMHO
it's better to be more descriptive.

Sure, done.

Attached is a revised patch series, that incorporates the feedback above
and makes two additional changes:

* Add some regression tests to cover COPY behavior with octal and hex
escape sequences
* Optimize the COPY TO text (non-CSV) code path (CopyAttributeOutText()).

In CopyAttributeOutText(), I refactored some code into a helper function to
reduce code duplication, on the theory that field delimiters and escape
sequences are rare, so we don't mind taking a function call in those cases.

We could go further and use the same code to handle both the tail of the
string in the vectorized case and the entire string in the non-vectorized
case, but I didn't bother with that -- as written, it would require taking
an unnecessary strlen() of the input string in the non-vectorized case.

Performance for COPY TO in text (non-CSV) mode:

===
master

Benchmark 1: ./psql -f
/Users/neilconway/copy-out-bench-text-long-strings.sql
Time (mean ± σ): 1.240 s ± 0.013 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.220 s … 1.256 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-text-short.sql
Time (mean ± σ): 522.3 ms ± 11.3 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 512.0 ms … 544.3 ms 10 runs

master + SIMD patches:

Benchmark 1: ./psql -f
/Users/neilconway/copy-out-bench-text-long-strings.sql
Time (mean ± σ): 867.6 ms ± 12.7 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 842.1 ms … 891.6 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-text-short.sql
Time (mean ± σ): 536.7 ms ± 10.9 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 530.1 ms … 566.8 ms 10 runs
===

Looks like there is a slight regression for short attribute values, but I
think the tradeoff is a net win.

I'm going to take a look at applying similar ideas to COPY FROM next.

Neil

Attachments:

v2-0001-Adjust-misleading-comment-placement.patchapplication/octet-stream; name=v2-0001-Adjust-misleading-comment-placement.patchDownload

From 7c8742685dee201f4fc182143dbcc6587f90a8f9 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 12:01:48 -0400
Subject: [PATCH v2 1/4] Adjust misleading comment placement

---
 src/backend/commands/copyto.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36d7..cd2d7bb217 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -969,15 +969,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldcontext);
 }
 
-/*
- * Send text representation of one attribute, with conversion and escaping
- */
 #define DUMPSOFAR() \
 	do { \
 		if (ptr > start) \
 			CopySendData(cstate, start, ptr - start); \
 	} while (0)
 
+/*
+ * Send text representation of one attribute, with conversion and escaping
+ */
 static void
 CopyAttributeOutText(CopyToState cstate, const char *string)
 {
-- 
2.39.3 (Apple Git-146)

v2-0002-Improve-COPY-test-coverage-for-handling-of-contro.patchapplication/octet-stream; name=v2-0002-Improve-COPY-test-coverage-for-handling-of-contro.patchDownload

From 39650379c974d64631960510be12d88595249641 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 18:50:30 -0400
Subject: [PATCH v2 2/4] Improve COPY test coverage for handling of control
 characters

---
 src/test/regress/expected/copy2.out | 38 +++++++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql      | 25 +++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 931542f268..be654daf4a 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -905,3 +905,41 @@ truncate copy_default;
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
 ERROR:  COPY DEFAULT only available using COPY FROM
+-- Test handling of control characters
+create temp table copy_ctl (a text, b text);
+copy copy_ctl from stdin;
+copy copy_ctl to stdout;
+abc	def
+\n	def
+abc	\n
+\t	def
+abc	\t
+ab\b	\bdef
+a\vbc	def\v
+\f\f\f	g\fg
+\f	def
+abc	
+	def
+abc	
+\\	\n\t\\
+9	9999
+S4	j
+copy copy_ctl to stdout with (format csv);
+abc,def
+"
+",def
+abc,"
+"
+	,def
+abc,	
+ab,def
+abc,def
+,gg
+,def
+abc,
+,def
+abc,
+\,"
+	\"
+9,9999
+S4,j
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..e8d495e024 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -676,3 +676,28 @@ truncate copy_default;
 
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
+
+-- Test handling of control characters
+create temp table copy_ctl (a text, b text);
+
+copy copy_ctl from stdin;
+abc	def
+\n	def
+abc	\n
+\t	def
+abc	\t
+ab\b	\bdef
+a\vbc	def\v
+\f\f\f	g\fg
+\x0c	def
+abc	\x6
+\6	def
+abc	\017
+\\	\
+\	\\
+\9	\9999
+\1234	\j
+\.
+
+copy copy_ctl to stdout;
+copy copy_ctl to stdout with (format csv);
-- 
2.39.3 (Apple Git-146)

v2-0003-Optimize-COPY-TO-in-CSV-format-using-SIMD.patchapplication/octet-stream; name=v2-0003-Optimize-COPY-TO-in-CSV-format-using-SIMD.patchDownload

From a00c3354b126e39ce057d910a3a040f96e1491d3 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 14:00:58 -0400
Subject: [PATCH v2 3/4] Optimize COPY TO in CSV format using SIMD

CopyAttributeOutCSV() does one or two byte-by-byte loops over the text of each
attribute, depending on whether quotation is required. Implementing this loops
using SIMD yields a significant speedup for long attribute values. For short
attribute values, performance is roughly unchanged.

We don't attempt to apply this optimization encoding_embeds_ascii is true,
because the required bookkeeping would be complicated.
---
 src/backend/commands/copyto.c | 152 +++++++++++++++++++++++++++++++++-
 1 file changed, 149 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cd2d7bb217..9114bb1c48 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -29,6 +29,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -1127,6 +1128,144 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	DUMPSOFAR();
 }
 
+/*
+ * Send text representation of one attribute, with conversion and CSV-style
+ * escaping.  This is significantly faster for wide attributes, assuming that
+ * control characters are rare.
+ *
+ * This variant assumes that encoding_embeds_ascii is false.  This simplifies
+ * the implementation because we can look at arbitrary-sized chunks of bytes,
+ * without needing to go through the pg_encoding_mblen() machinery to ensure
+ * that multibyte characters don't cross chunk boundaries.  In principle we
+ * could combine vectorization with such encodings, but the bookkeeping
+ * required would be complicated.
+ */
+static void
+CopyAttributeOutCSVVector(CopyToState cstate, const char *ptr,
+						  bool use_quote)
+{
+	int			len;
+	int			vlen;
+	char		delimc = cstate->opts.delim[0];
+	char		quotec = cstate->opts.quote[0];
+	char		escapec = cstate->opts.escape[0];
+
+	len = strlen(ptr);
+	vlen = len & (int) (~(sizeof(Vector8) - 1));
+
+	/*
+	 * Make a preliminary pass to discover if it needs quoting
+	 */
+	if (!use_quote)
+	{
+		bool	single_attr = (list_length(cstate->attnumlist) == 1);
+
+		/*
+		 * Because '\.' can be a data value, quote it if it appears alone on a
+		 * line so it is not interpreted as the end-of-data marker.
+		 */
+		if (single_attr && strcmp(ptr, "\\.") == 0)
+			use_quote = true;
+		else
+		{
+			int		i;
+			Vector8 chunk;
+
+			for (i = 0; i < vlen; i += sizeof(Vector8))
+			{
+				vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+				if (vector8_has(chunk, (unsigned char) delimc) ||
+					vector8_has(chunk, (unsigned char) quotec) ||
+					vector8_has(chunk, (unsigned char) '\n') ||
+					vector8_has(chunk, (unsigned char) '\r'))
+				{
+					use_quote = true;
+					break;
+				}
+			}
+
+			/* Check the tail of the string */
+			if (!use_quote)
+			{
+				for (; i < len; i++)
+				{
+					char c = ptr[i];
+
+					if (c == delimc || c == quotec || c == '\n' || c == '\r')
+					{
+						use_quote = true;
+						break;
+					}
+				}
+			}
+		}
+	}
+
+	if (use_quote)
+	{
+		int		i;
+		int		start_idx = 0;
+		Vector8 chunk;
+
+		CopySendChar(cstate, quotec);
+
+		for (i = 0; i < vlen; i += sizeof(Vector8))
+		{
+			vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+			if (vector8_has(chunk, (unsigned char) delimc) ||
+				vector8_has(chunk, (unsigned char) quotec))
+			{
+				/*
+				 * This chunk has one or more characters that require
+				 * escaping, so switch to byte-at-a-time processing
+				 */
+				for (int j = i; j < (i + sizeof(Vector8)); j++)
+				{
+					char c = ptr[j];
+
+					if (c == quotec || c == escapec)
+					{
+						if (j > start_idx)
+							CopySendData(cstate, ptr + start_idx, j - start_idx);
+
+						CopySendChar(cstate, escapec);
+						start_idx = j;
+					}
+				}
+			}
+		}
+
+		/* Process the tail of the string */
+		for (; i < len; i++)
+		{
+			char c = ptr[i];
+
+			if (c == quotec || c == escapec)
+			{
+				if (i > start_idx)
+					CopySendData(cstate, ptr + start_idx, i - start_idx);
+
+				CopySendChar(cstate, escapec);
+				start_idx = i;
+			}
+		}
+
+		/* Send any remaining text */
+		if (start_idx < len)
+			CopySendData(cstate, ptr + start_idx, len - start_idx);
+
+		CopySendChar(cstate, quotec);
+	}
+	else
+	{
+		/* If it doesn't need quoting, we can just dump it as-is */
+		CopySendData(cstate, ptr, len);
+	}
+}
+
+
 /*
  * Send text representation of one attribute, with conversion and
  * CSV-style escaping
@@ -1141,7 +1280,6 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	char		delimc = cstate->opts.delim[0];
 	char		quotec = cstate->opts.quote[0];
 	char		escapec = cstate->opts.escape[0];
-	bool		single_attr = (list_length(cstate->attnumlist) == 1);
 
 	/* force quoting if it matches null_print (before conversion!) */
 	if (!use_quote && strcmp(string, cstate->opts.null_print) == 0)
@@ -1152,11 +1290,19 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	else
 		ptr = string;
 
+	if (!cstate->encoding_embeds_ascii)
+	{
+		CopyAttributeOutCSVVector(cstate, ptr, use_quote);
+		return;
+	}
+
 	/*
 	 * Make a preliminary pass to discover if it needs quoting
 	 */
 	if (!use_quote)
 	{
+		bool	single_attr = (list_length(cstate->attnumlist) == 1);
+
 		/*
 		 * Because '\.' can be a data value, quote it if it appears alone on a
 		 * line so it is not interpreted as the end-of-data marker.
@@ -1174,7 +1320,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 					use_quote = true;
 					break;
 				}
-				if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+				if (IS_HIGHBIT_SET(c))
 					tptr += pg_encoding_mblen(cstate->file_encoding, tptr);
 				else
 					tptr++;
@@ -1198,7 +1344,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 				CopySendChar(cstate, escapec);
 				start = ptr;	/* we include char in next run */
 			}
-			if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+			if (IS_HIGHBIT_SET(c))
 				ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
 			else
 				ptr++;
-- 
2.39.3 (Apple Git-146)

v2-0004-Optimize-COPY-TO-in-text-format-using-SIMD.patchapplication/octet-stream; name=v2-0004-Optimize-COPY-TO-in-text-format-using-SIMD.patchDownload

From 7f816d88ba3d9e3d2c2d65b4160af84885cf37d5 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 18:00:23 -0400
Subject: [PATCH v2 4/4] Optimize COPY TO in text format using SIMD

CopyAttributeOutText() does a byte-by-byte loop looking for field delimiters and
escape sequences. Vectorizing this loop using SIMD yields a significant speedup
for wide attributes, assuming that escape sequences are rare.

We don't attempt to apply this optimization when encoding_embeds_ascii is true,
because the bookkeeping required would be complicated.
---
 src/backend/commands/copyto.c | 275 ++++++++++++++++++----------------
 1 file changed, 149 insertions(+), 126 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9114bb1c48..e676b4e888 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -970,6 +970,140 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldcontext);
 }
 
+static void
+EmitTextCharacter(CopyToState cstate, char c)
+{
+	char	delimc = cstate->opts.delim[0];
+
+	if ((unsigned char) c < (unsigned char) 0x20)
+	{
+		/*
+		 * \r and \n must be escaped; we choose to escape several other common
+		 * control characters for the sake of tradition. We prefer to dump
+		 * these using the C-like notation, rather than a backslash and the
+		 * literal character, because it makes the dump file a bit more proof
+		 * against Microsoftish data mangling.
+		 */
+		switch (c)
+		{
+			case '\b':
+				c = 'b';
+				break;
+			case '\f':
+				c = 'f';
+				break;
+			case '\n':
+				c = 'n';
+				break;
+			case '\r':
+				c = 'r';
+				break;
+			case '\t':
+				c = 't';
+				break;
+			case '\v':
+				c = 'v';
+				break;
+			default:
+				/*
+				 * Record delimiter must be escaped, even if it is a control
+				 * character. Other control characters can be emitted as-is.
+				 */
+				if (c != delimc)
+				{
+					CopySendChar(cstate, c);
+					return;
+				}
+		}
+
+		CopySendChar(cstate, '\\');
+		CopySendChar(cstate, c);
+	}
+	else if (c == '\\' || c == delimc)
+	{
+		CopySendChar(cstate, '\\');
+		CopySendChar(cstate, c);
+	}
+	else
+	{
+		CopySendChar(cstate, c);
+	}
+}
+
+/*
+ * Send text representation of one attribute, with conversion and escaping.
+ * This variant is vectorized using SIMD instructions.  This is significantly
+ * faster for wide attributes, assuming that control characters are rare.
+ *
+ * This variant assumes that encoding_embeds_ascii is false.  This simplifies
+ * the implementation because we can look at arbitrary-sized chunks of bytes,
+ * without needing to go through the pg_encoding_mblen() machinery to ensure
+ * that multibyte characters don't cross chunk boundaries.  In principle we
+ * could combine vectorization with such encodings, but the bookkeeping
+ * required would be complicated.
+ */
+static void
+CopyAttributeOutTextVector(CopyToState cstate, const char *ptr)
+{
+	int			i;
+	int			len;
+	int			vlen;
+	int			start_idx;
+	Vector8		chunk;
+	char		delimc = cstate->opts.delim[0];
+
+	len = strlen(ptr);
+	vlen = len & (int) (~(sizeof(Vector8) - 1));
+	start_idx = 0;
+
+	for (i = 0; i < vlen; i += sizeof(Vector8))
+	{
+		vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+		/*
+		 * Check if the chunk contains any field delimiters or escape
+		 * sequences.  If so, switch to byte-by-byte processing.
+		 */
+		if (vector8_has_le(chunk, (unsigned char) 0x01f) ||
+			vector8_has(chunk, (unsigned char) '\\') ||
+			vector8_has(chunk, (unsigned char) delimc))
+		{
+			if (i > start_idx)
+			{
+				CopySendData(cstate, ptr + start_idx, i - start_idx);
+				start_idx = i;
+			}
+
+			for (int j = i; j < (i + sizeof(Vector8)); j++)
+			{
+				EmitTextCharacter(cstate, ptr[j]);
+				start_idx++;
+			}
+		}
+	}
+
+	/* Process the tail of the string */
+	for (; i < len; i++)
+	{
+		char c = ptr[i];
+
+		if ((unsigned char) c < (unsigned char) 0x20 ||
+			c == '\\' || c == delimc)
+		{
+			if (i > start_idx)
+			{
+				CopySendData(cstate, ptr + start_idx, i - start_idx);
+				start_idx = i;
+			}
+			EmitTextCharacter(cstate, c);
+			start_idx++;
+		}
+	}
+
+	if (i > start_idx)
+		CopySendData(cstate, ptr + start_idx, i - start_idx);
+}
+
 #define DUMPSOFAR() \
 	do { \
 		if (ptr > start) \
@@ -992,137 +1126,26 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 		ptr = string;
 
-	/*
-	 * We have to grovel through the string searching for control characters
-	 * and instances of the delimiter character.  In most cases, though, these
-	 * are infrequent.  To avoid overhead from calling CopySendData once per
-	 * character, we dump out all characters between escaped characters in a
-	 * single call.  The loop invariant is that the data from "start" to "ptr"
-	 * can be sent literally, but hasn't yet been.
-	 *
-	 * We can skip pg_encoding_mblen() overhead when encoding is safe, because
-	 * in valid backend encodings, extra bytes of a multibyte character never
-	 * look like ASCII.  This loop is sufficiently performance-critical that
-	 * it's worth making two copies of it to get the IS_HIGHBIT_SET() test out
-	 * of the normal safe-encoding path.
-	 */
-	if (cstate->encoding_embeds_ascii)
+	if (!cstate->encoding_embeds_ascii)
 	{
-		start = ptr;
-		while ((c = *ptr) != '\0')
-		{
-			if ((unsigned char) c < (unsigned char) 0x20)
-			{
-				/*
-				 * \r and \n must be escaped, the others are traditional. We
-				 * prefer to dump these using the C-like notation, rather than
-				 * a backslash and the literal character, because it makes the
-				 * dump file a bit more proof against Microsoftish data
-				 * mangling.
-				 */
-				switch (c)
-				{
-					case '\b':
-						c = 'b';
-						break;
-					case '\f':
-						c = 'f';
-						break;
-					case '\n':
-						c = 'n';
-						break;
-					case '\r':
-						c = 'r';
-						break;
-					case '\t':
-						c = 't';
-						break;
-					case '\v':
-						c = 'v';
-						break;
-					default:
-						/* If it's the delimiter, must backslash it */
-						if (c == delimc)
-							break;
-						/* All ASCII control chars are length 1 */
-						ptr++;
-						continue;	/* fall to end of loop */
-				}
-				/* if we get here, we need to convert the control char */
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				CopySendChar(cstate, c);
-				start = ++ptr;	/* do not include char in next run */
-			}
-			else if (c == '\\' || c == delimc)
-			{
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				start = ptr++;	/* we include char in next run */
-			}
-			else if (IS_HIGHBIT_SET(c))
-				ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
-			else
-				ptr++;
-		}
+		CopyAttributeOutTextVector(cstate, ptr);
+		return;
 	}
-	else
+
+	start = ptr;
+	while ((c = *ptr) != '\0')
 	{
-		start = ptr;
-		while ((c = *ptr) != '\0')
+		if ((unsigned char) c < (unsigned char) 0x20 ||
+			c == '\\' || c == delimc)
 		{
-			if ((unsigned char) c < (unsigned char) 0x20)
-			{
-				/*
-				 * \r and \n must be escaped, the others are traditional. We
-				 * prefer to dump these using the C-like notation, rather than
-				 * a backslash and the literal character, because it makes the
-				 * dump file a bit more proof against Microsoftish data
-				 * mangling.
-				 */
-				switch (c)
-				{
-					case '\b':
-						c = 'b';
-						break;
-					case '\f':
-						c = 'f';
-						break;
-					case '\n':
-						c = 'n';
-						break;
-					case '\r':
-						c = 'r';
-						break;
-					case '\t':
-						c = 't';
-						break;
-					case '\v':
-						c = 'v';
-						break;
-					default:
-						/* If it's the delimiter, must backslash it */
-						if (c == delimc)
-							break;
-						/* All ASCII control chars are length 1 */
-						ptr++;
-						continue;	/* fall to end of loop */
-				}
-				/* if we get here, we need to convert the control char */
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				CopySendChar(cstate, c);
-				start = ++ptr;	/* do not include char in next run */
-			}
-			else if (c == '\\' || c == delimc)
-			{
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				start = ptr++;	/* we include char in next run */
-			}
-			else
-				ptr++;
+			DUMPSOFAR();
+			EmitTextCharacter(cstate, c);
+			start = ++ptr;
 		}
+		else if (IS_HIGHBIT_SET(c))
+			ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
+		else
+			ptr++;
 	}
 
 	DUMPSOFAR();
-- 
2.39.3 (Apple Git-146)

Nathan Bossart

nathandbossart@gmail.com

over 1 year ago

In reply to: Neil Conway (#5)

Re: Optimizing COPY with SIMD

On Wed, Jun 05, 2024 at 01:46:44PM -0400, Neil Conway wrote:

We could go further and use the same code to handle both the tail of the
string in the vectorized case and the entire string in the non-vectorized
case, but I didn't bother with that -- as written, it would require taking
an unnecessary strlen() of the input string in the non-vectorized case.

For pg_lfind32(), we ended up using an overlapping approach for the
vectorized case (see commit 7644a73). That appeared to help more than it
harmed in the many (admittedly branch predictor friendly) tests I ran. I
wonder if you could do something similar here.

Looks like there is a slight regression for short attribute values, but I
think the tradeoff is a net win.

It'd be interesting to see the threshold where your patch starts winning.
IIUC the vector stuff won't take effect until there are 16 bytes to
process. If we don't expect attributes to ordinarily be >= 16 bytes, it
might be worth trying to mitigate this ~3% regression. Maybe we can find
some other small gains elsewhere to offset it.

--
nathan

Neil Conway

neil.conway@gmail.com

over 1 year ago

In reply to: Nathan Bossart (#6)

6 attachment(s)

Re: Optimizing COPY with SIMD

On Wed, Jun 5, 2024 at 3:05 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

For pg_lfind32(), we ended up using an overlapping approach for the
vectorized case (see commit 7644a73). That appeared to help more than it
harmed in the many (admittedly branch predictor friendly) tests I ran. I
wonder if you could do something similar here.

I didn't entirely follow what you are suggesting here -- seems like we
would need to do strlen() for the non-SIMD case if we tried to use a
similar approach.

It'd be interesting to see the threshold where your patch starts winning.

IIUC the vector stuff won't take effect until there are 16 bytes to
process. If we don't expect attributes to ordinarily be >= 16 bytes, it
might be worth trying to mitigate this ~3% regression. Maybe we can find
some other small gains elsewhere to offset it.

For the particular short-strings benchmark I have been using (3 columns
with 8-character ASCII strings in each), I suspect the regression is caused
by the need to do a strlen(), rather than the vectorized loop itself (we
skip the vectorized loop anyway because sizeof(Vector8) == 16 on this
machine). (This explains why we see a regression on short strings for text
but not CSV: CSV needed to do a strlen() for the non-quoted-string case
regardless). Unfortunately this makes it tricky to make the optimization
conditional on the length of the string. I suppose we could play some games
where we start with a byte-by-byte loop and then switch over to the
vectorized path (and take a strlen()) if we have seen more than, say,
sizeof(Vector8) bytes so far. Seems a bit kludgy though.

I will do some more benchmarking and report back. For the time being, I'm
not inclined to push to get the CopyAttributeOutTextVector() into the tree
in its current state, as I agree that the short-attribute case is quite
important.

In the meantime, attached is a revised patch series. This uses SIMD to
optimize CopyReadLineText in COPY FROM. Performance results:

====
master @ 8fea1bd5411b:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.944 s ± 0.013 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.927 s … 1.975 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.021 s ± 0.017 s [User: 0.002 s, System: 0.001
s]
Range (min … max): 1.005 s … 1.053 s 10 runs

master + SIMD patches:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.513 s ± 0.022 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.493 s … 1.552 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.032 s ± 0.032 s [User: 0.002 s, System: 0.001
s]
Range (min … max): 1.009 s … 1.113 s 10 runs
====

Neil

Attachments:

v4-0005-Optimize-COPY-TO-in-text-format-using-SIMD.patchapplication/octet-stream; name=v4-0005-Optimize-COPY-TO-in-text-format-using-SIMD.patchDownload

From 2971ed42796b5e79b8ffc0dd84b8474874fb82e8 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 18:00:23 -0400
Subject: [PATCH v4 5/6] Optimize COPY TO in text format using SIMD

CopyAttributeOutText() does a byte-by-byte loop looking for field delimiters and
escape sequences. Vectorizing this loop using SIMD yields a significant speedup
for wide attributes, assuming that escape sequences are rare.

We don't attempt to apply this optimization when encoding_embeds_ascii is true,
because the bookkeeping required would be complicated.
---
 src/backend/commands/copyto.c | 280 +++++++++++++++++++---------------
 1 file changed, 154 insertions(+), 126 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9114bb1c48..2453aa08df 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -970,6 +970,145 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldcontext);
 }
 
+static void
+EmitTextCharacter(CopyToState cstate, char c)
+{
+	char	delimc = cstate->opts.delim[0];
+
+	if ((unsigned char) c < (unsigned char) 0x20)
+	{
+		/*
+		 * \r and \n must be escaped; we choose to escape several other common
+		 * control characters for the sake of tradition. We prefer to dump
+		 * these using the C-like notation, rather than a backslash and the
+		 * literal character, because it makes the dump file a bit more proof
+		 * against Microsoftish data mangling.
+		 */
+		switch (c)
+		{
+			case '\b':
+				c = 'b';
+				break;
+			case '\f':
+				c = 'f';
+				break;
+			case '\n':
+				c = 'n';
+				break;
+			case '\r':
+				c = 'r';
+				break;
+			case '\t':
+				c = 't';
+				break;
+			case '\v':
+				c = 'v';
+				break;
+			default:
+				/*
+				 * Record delimiter must be escaped, even if it is a control
+				 * character. Other control characters can be emitted as-is.
+				 */
+				if (c != delimc)
+				{
+					CopySendChar(cstate, c);
+					return;
+				}
+		}
+
+		CopySendChar(cstate, '\\');
+		CopySendChar(cstate, c);
+	}
+	else if (c == '\\' || c == delimc)
+	{
+		CopySendChar(cstate, '\\');
+		CopySendChar(cstate, c);
+	}
+	else
+	{
+		CopySendChar(cstate, c);
+	}
+}
+
+/*
+ * Send text representation of one attribute, with conversion and escaping.
+ * This variant is vectorized using SIMD instructions.  This is significantly
+ * faster for wide attributes, assuming that control characters are rare.
+ *
+ * This variant assumes that encoding_embeds_ascii is false.  This simplifies
+ * the implementation because we can look at arbitrary-sized chunks of bytes,
+ * without needing to go through the pg_encoding_mblen() machinery to ensure
+ * that multibyte characters don't cross chunk boundaries.  In principle we
+ * could combine vectorization with such encodings, but the bookkeeping
+ * required would be complicated.
+ */
+static void
+CopyAttributeOutTextVector(CopyToState cstate, const char *ptr)
+{
+	int			i;
+	int			len;
+	int			vlen;
+	int			start_idx;
+	Vector8		chunk;
+	char		delimc = cstate->opts.delim[0];
+
+	len = strlen(ptr);
+	vlen = len & (int) (~(sizeof(Vector8) - 1));
+	start_idx = 0;
+
+	for (i = 0; i < vlen; i += sizeof(Vector8))
+	{
+		vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+		/*
+		 * Check if the chunk contains any field delimiters or escape
+		 * sequences.  If so, switch to byte-by-byte processing.
+		 */
+		if (vector8_has_le(chunk, (unsigned char) 0x1f) ||
+			vector8_has(chunk, (unsigned char) '\\') ||
+			vector8_has(chunk, (unsigned char) delimc))
+		{
+			for (int j = i; j < (i + sizeof(Vector8)); j++)
+			{
+				char c = ptr[j];
+
+				if ((unsigned char) c <= (unsigned char) 0x1f ||
+					c == '\\' || c == delimc)
+				{
+					if (j > start_idx)
+					{
+						CopySendData(cstate, ptr + start_idx, j - start_idx);
+						start_idx = j;
+					}
+					EmitTextCharacter(cstate, c);
+					start_idx++;
+				}
+			}
+		}
+	}
+
+	/* Process the tail of the string */
+	for (; i < len; i++)
+	{
+		char c = ptr[i];
+
+		if ((unsigned char) c <= (unsigned char) 0x1f ||
+			c == '\\' || c == delimc)
+		{
+			if (i > start_idx)
+			{
+				CopySendData(cstate, ptr + start_idx, i - start_idx);
+				start_idx = i;
+			}
+			EmitTextCharacter(cstate, c);
+			start_idx++;
+		}
+	}
+
+	if (i > start_idx)
+		CopySendData(cstate, ptr + start_idx, i - start_idx);
+}
+
 #define DUMPSOFAR() \
 	do { \
 		if (ptr > start) \
@@ -992,137 +1131,26 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 		ptr = string;
 
-	/*
-	 * We have to grovel through the string searching for control characters
-	 * and instances of the delimiter character.  In most cases, though, these
-	 * are infrequent.  To avoid overhead from calling CopySendData once per
-	 * character, we dump out all characters between escaped characters in a
-	 * single call.  The loop invariant is that the data from "start" to "ptr"
-	 * can be sent literally, but hasn't yet been.
-	 *
-	 * We can skip pg_encoding_mblen() overhead when encoding is safe, because
-	 * in valid backend encodings, extra bytes of a multibyte character never
-	 * look like ASCII.  This loop is sufficiently performance-critical that
-	 * it's worth making two copies of it to get the IS_HIGHBIT_SET() test out
-	 * of the normal safe-encoding path.
-	 */
-	if (cstate->encoding_embeds_ascii)
+	if (!cstate->encoding_embeds_ascii)
 	{
-		start = ptr;
-		while ((c = *ptr) != '\0')
-		{
-			if ((unsigned char) c < (unsigned char) 0x20)
-			{
-				/*
-				 * \r and \n must be escaped, the others are traditional. We
-				 * prefer to dump these using the C-like notation, rather than
-				 * a backslash and the literal character, because it makes the
-				 * dump file a bit more proof against Microsoftish data
-				 * mangling.
-				 */
-				switch (c)
-				{
-					case '\b':
-						c = 'b';
-						break;
-					case '\f':
-						c = 'f';
-						break;
-					case '\n':
-						c = 'n';
-						break;
-					case '\r':
-						c = 'r';
-						break;
-					case '\t':
-						c = 't';
-						break;
-					case '\v':
-						c = 'v';
-						break;
-					default:
-						/* If it's the delimiter, must backslash it */
-						if (c == delimc)
-							break;
-						/* All ASCII control chars are length 1 */
-						ptr++;
-						continue;	/* fall to end of loop */
-				}
-				/* if we get here, we need to convert the control char */
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				CopySendChar(cstate, c);
-				start = ++ptr;	/* do not include char in next run */
-			}
-			else if (c == '\\' || c == delimc)
-			{
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				start = ptr++;	/* we include char in next run */
-			}
-			else if (IS_HIGHBIT_SET(c))
-				ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
-			else
-				ptr++;
-		}
+		CopyAttributeOutTextVector(cstate, ptr);
+		return;
 	}
-	else
+
+	start = ptr;
+	while ((c = *ptr) != '\0')
 	{
-		start = ptr;
-		while ((c = *ptr) != '\0')
+		if ((unsigned char) c < (unsigned char) 0x20 ||
+			c == '\\' || c == delimc)
 		{
-			if ((unsigned char) c < (unsigned char) 0x20)
-			{
-				/*
-				 * \r and \n must be escaped, the others are traditional. We
-				 * prefer to dump these using the C-like notation, rather than
-				 * a backslash and the literal character, because it makes the
-				 * dump file a bit more proof against Microsoftish data
-				 * mangling.
-				 */
-				switch (c)
-				{
-					case '\b':
-						c = 'b';
-						break;
-					case '\f':
-						c = 'f';
-						break;
-					case '\n':
-						c = 'n';
-						break;
-					case '\r':
-						c = 'r';
-						break;
-					case '\t':
-						c = 't';
-						break;
-					case '\v':
-						c = 'v';
-						break;
-					default:
-						/* If it's the delimiter, must backslash it */
-						if (c == delimc)
-							break;
-						/* All ASCII control chars are length 1 */
-						ptr++;
-						continue;	/* fall to end of loop */
-				}
-				/* if we get here, we need to convert the control char */
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				CopySendChar(cstate, c);
-				start = ++ptr;	/* do not include char in next run */
-			}
-			else if (c == '\\' || c == delimc)
-			{
-				DUMPSOFAR();
-				CopySendChar(cstate, '\\');
-				start = ptr++;	/* we include char in next run */
-			}
-			else
-				ptr++;
+			DUMPSOFAR();
+			EmitTextCharacter(cstate, c);
+			start = ++ptr;
 		}
+		else if (IS_HIGHBIT_SET(c))
+			ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
+		else
+			ptr++;
 	}
 
 	DUMPSOFAR();
-- 
2.39.3 (Apple Git-146)

v4-0003-Cosmetic-code-cleanup-for-CopyReadLineText.patchapplication/octet-stream; name=v4-0003-Cosmetic-code-cleanup-for-CopyReadLineText.patchDownload

From 3292dcda3d7a307af0fc09d8eb276cfeb1911058 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil.conway@gmail.com>
Date: Fri, 7 Jun 2024 08:40:06 -0400
Subject: [PATCH v4 3/6] Cosmetic code cleanup for CopyReadLineText()

Use a consistent naming prefix for variables relating to the COPY input buffer.
---
 src/backend/commands/copyfromparse.c | 30 ++++++++++++++--------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7efcb89159..067a33f924 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -98,7 +98,7 @@
 #define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \
 if (1) \
 { \
-	if (input_buf_ptr + (extralen) >= copy_buf_len && !hit_eof) \
+	if (input_buf_ptr + (extralen) >= input_buf_len && !hit_eof) \
 	{ \
 		input_buf_ptr = prev_raw_ptr; /* undo fetch */ \
 		need_data = true; \
@@ -110,10 +110,10 @@ if (1) \
 #define IF_NEED_REFILL_AND_EOF_BREAK(extralen) \
 if (1) \
 { \
-	if (input_buf_ptr + (extralen) >= copy_buf_len && hit_eof) \
+	if (input_buf_ptr + (extralen) >= input_buf_len && hit_eof) \
 	{ \
 		if (extralen) \
-			input_buf_ptr = copy_buf_len; /* consume the partial character */ \
+			input_buf_ptr = input_buf_len; /* consume the partial character */ \
 		/* backslash just before EOF, treat as data char */ \
 		result = true; \
 		break; \
@@ -130,7 +130,7 @@ if (1) \
 	if (input_buf_ptr > cstate->input_buf_index) \
 	{ \
 		appendBinaryStringInfo(&cstate->line_buf, \
-							 cstate->input_buf + cstate->input_buf_index, \
+							   cstate->input_buf + cstate->input_buf_index, \
 							   input_buf_ptr - cstate->input_buf_index); \
 		cstate->input_buf_index = input_buf_ptr; \
 	} \
@@ -1174,9 +1174,9 @@ CopyReadLine(CopyFromState cstate)
 static bool
 CopyReadLineText(CopyFromState cstate)
 {
-	char	   *copy_input_buf;
+	char	   *input_buf;
 	int			input_buf_ptr;
-	int			copy_buf_len;
+	int			input_buf_len;
 	bool		need_data = false;
 	bool		hit_eof = false;
 	bool		result = false;
@@ -1222,9 +1222,9 @@ CopyReadLineText(CopyFromState cstate)
 	 * For a little extra speed within the loop, we copy input_buf and
 	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
+	input_buf = cstate->input_buf;
 	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
+	input_buf_len = cstate->input_buf_len;
 
 	for (;;)
 	{
@@ -1239,7 +1239,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * unsafe with the old v2 COPY protocol, but we don't support that
 		 * anymore.
 		 */
-		if (input_buf_ptr >= copy_buf_len || need_data)
+		if (input_buf_ptr >= input_buf_len || need_data)
 		{
 			REFILL_LINEBUF;
 
@@ -1247,7 +1247,7 @@ CopyReadLineText(CopyFromState cstate)
 			/* update our local variables */
 			hit_eof = cstate->input_reached_eof;
 			input_buf_ptr = cstate->input_buf_index;
-			copy_buf_len = cstate->input_buf_len;
+			input_buf_len = cstate->input_buf_len;
 
 			/*
 			 * If we are completely out of data, break out of the loop,
@@ -1263,7 +1263,7 @@ CopyReadLineText(CopyFromState cstate)
 
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
-		c = copy_input_buf[input_buf_ptr++];
+		c = input_buf[input_buf_ptr++];
 
 		if (cstate->opts.csv_mode)
 		{
@@ -1319,7 +1319,7 @@ CopyReadLineText(CopyFromState cstate)
 				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 
 				/* get next char */
-				c = copy_input_buf[input_buf_ptr];
+				c = input_buf[input_buf_ptr];
 
 				if (c == '\n')
 				{
@@ -1393,7 +1393,7 @@ CopyReadLineText(CopyFromState cstate)
 			 * through and continue processing.
 			 * -----
 			 */
-			c2 = copy_input_buf[input_buf_ptr];
+			c2 = input_buf[input_buf_ptr];
 
 			if (c2 == '.')
 			{
@@ -1409,7 +1409,7 @@ CopyReadLineText(CopyFromState cstate)
 					/* Get the next character */
 					IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 					/* if hit_eof, c2 will become '\0' */
-					c2 = copy_input_buf[input_buf_ptr++];
+					c2 = input_buf[input_buf_ptr++];
 
 					if (c2 == '\n')
 					{
@@ -1434,7 +1434,7 @@ CopyReadLineText(CopyFromState cstate)
 				/* Get the next character */
 				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 				/* if hit_eof, c2 will become '\0' */
-				c2 = copy_input_buf[input_buf_ptr++];
+				c2 = input_buf[input_buf_ptr++];
 
 				if (c2 != '\r' && c2 != '\n')
 				{
-- 
2.39.3 (Apple Git-146)

v4-0004-Optimize-COPY-TO-in-CSV-format-using-SIMD.patchapplication/octet-stream; name=v4-0004-Optimize-COPY-TO-in-CSV-format-using-SIMD.patchDownload

From f3e91bbf9a587fa91762aa3b8ba2f4b5838477b8 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 14:00:58 -0400
Subject: [PATCH v4 4/6] Optimize COPY TO in CSV format using SIMD

CopyAttributeOutCSV() does one or two byte-by-byte loops over the text of each
attribute, depending on whether quotation is required. Implementing this loops
using SIMD yields a significant speedup for long attribute values. For short
attribute values, performance is roughly unchanged.

We don't attempt to apply this optimization encoding_embeds_ascii is true,
because the required bookkeeping would be complicated.
---
 src/backend/commands/copyto.c | 152 +++++++++++++++++++++++++++++++++-
 1 file changed, 149 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cd2d7bb217..9114bb1c48 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -29,6 +29,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -1127,6 +1128,144 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	DUMPSOFAR();
 }
 
+/*
+ * Send text representation of one attribute, with conversion and CSV-style
+ * escaping.  This is significantly faster for wide attributes, assuming that
+ * control characters are rare.
+ *
+ * This variant assumes that encoding_embeds_ascii is false.  This simplifies
+ * the implementation because we can look at arbitrary-sized chunks of bytes,
+ * without needing to go through the pg_encoding_mblen() machinery to ensure
+ * that multibyte characters don't cross chunk boundaries.  In principle we
+ * could combine vectorization with such encodings, but the bookkeeping
+ * required would be complicated.
+ */
+static void
+CopyAttributeOutCSVVector(CopyToState cstate, const char *ptr,
+						  bool use_quote)
+{
+	int			len;
+	int			vlen;
+	char		delimc = cstate->opts.delim[0];
+	char		quotec = cstate->opts.quote[0];
+	char		escapec = cstate->opts.escape[0];
+
+	len = strlen(ptr);
+	vlen = len & (int) (~(sizeof(Vector8) - 1));
+
+	/*
+	 * Make a preliminary pass to discover if it needs quoting
+	 */
+	if (!use_quote)
+	{
+		bool	single_attr = (list_length(cstate->attnumlist) == 1);
+
+		/*
+		 * Because '\.' can be a data value, quote it if it appears alone on a
+		 * line so it is not interpreted as the end-of-data marker.
+		 */
+		if (single_attr && strcmp(ptr, "\\.") == 0)
+			use_quote = true;
+		else
+		{
+			int		i;
+			Vector8 chunk;
+
+			for (i = 0; i < vlen; i += sizeof(Vector8))
+			{
+				vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+				if (vector8_has(chunk, (unsigned char) delimc) ||
+					vector8_has(chunk, (unsigned char) quotec) ||
+					vector8_has(chunk, (unsigned char) '\n') ||
+					vector8_has(chunk, (unsigned char) '\r'))
+				{
+					use_quote = true;
+					break;
+				}
+			}
+
+			/* Check the tail of the string */
+			if (!use_quote)
+			{
+				for (; i < len; i++)
+				{
+					char c = ptr[i];
+
+					if (c == delimc || c == quotec || c == '\n' || c == '\r')
+					{
+						use_quote = true;
+						break;
+					}
+				}
+			}
+		}
+	}
+
+	if (use_quote)
+	{
+		int		i;
+		int		start_idx = 0;
+		Vector8 chunk;
+
+		CopySendChar(cstate, quotec);
+
+		for (i = 0; i < vlen; i += sizeof(Vector8))
+		{
+			vector8_load(&chunk, (const uint8 *) &ptr[i]);
+
+			if (vector8_has(chunk, (unsigned char) delimc) ||
+				vector8_has(chunk, (unsigned char) quotec))
+			{
+				/*
+				 * This chunk has one or more characters that require
+				 * escaping, so switch to byte-at-a-time processing
+				 */
+				for (int j = i; j < (i + sizeof(Vector8)); j++)
+				{
+					char c = ptr[j];
+
+					if (c == quotec || c == escapec)
+					{
+						if (j > start_idx)
+							CopySendData(cstate, ptr + start_idx, j - start_idx);
+
+						CopySendChar(cstate, escapec);
+						start_idx = j;
+					}
+				}
+			}
+		}
+
+		/* Process the tail of the string */
+		for (; i < len; i++)
+		{
+			char c = ptr[i];
+
+			if (c == quotec || c == escapec)
+			{
+				if (i > start_idx)
+					CopySendData(cstate, ptr + start_idx, i - start_idx);
+
+				CopySendChar(cstate, escapec);
+				start_idx = i;
+			}
+		}
+
+		/* Send any remaining text */
+		if (start_idx < len)
+			CopySendData(cstate, ptr + start_idx, len - start_idx);
+
+		CopySendChar(cstate, quotec);
+	}
+	else
+	{
+		/* If it doesn't need quoting, we can just dump it as-is */
+		CopySendData(cstate, ptr, len);
+	}
+}
+
+
 /*
  * Send text representation of one attribute, with conversion and
  * CSV-style escaping
@@ -1141,7 +1280,6 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	char		delimc = cstate->opts.delim[0];
 	char		quotec = cstate->opts.quote[0];
 	char		escapec = cstate->opts.escape[0];
-	bool		single_attr = (list_length(cstate->attnumlist) == 1);
 
 	/* force quoting if it matches null_print (before conversion!) */
 	if (!use_quote && strcmp(string, cstate->opts.null_print) == 0)
@@ -1152,11 +1290,19 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	else
 		ptr = string;
 
+	if (!cstate->encoding_embeds_ascii)
+	{
+		CopyAttributeOutCSVVector(cstate, ptr, use_quote);
+		return;
+	}
+
 	/*
 	 * Make a preliminary pass to discover if it needs quoting
 	 */
 	if (!use_quote)
 	{
+		bool	single_attr = (list_length(cstate->attnumlist) == 1);
+
 		/*
 		 * Because '\.' can be a data value, quote it if it appears alone on a
 		 * line so it is not interpreted as the end-of-data marker.
@@ -1174,7 +1320,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 					use_quote = true;
 					break;
 				}
-				if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+				if (IS_HIGHBIT_SET(c))
 					tptr += pg_encoding_mblen(cstate->file_encoding, tptr);
 				else
 					tptr++;
@@ -1198,7 +1344,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 				CopySendChar(cstate, escapec);
 				start = ptr;	/* we include char in next run */
 			}
-			if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+			if (IS_HIGHBIT_SET(c))
 				ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
 			else
 				ptr++;
-- 
2.39.3 (Apple Git-146)

v4-0002-Improve-COPY-test-coverage-for-handling-of-contro.patchapplication/octet-stream; name=v4-0002-Improve-COPY-test-coverage-for-handling-of-contro.patchDownload

From 39650379c974d64631960510be12d88595249641 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 18:50:30 -0400
Subject: [PATCH v4 2/6] Improve COPY test coverage for handling of control
 characters

---
 src/test/regress/expected/copy2.out | 38 +++++++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql      | 25 +++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 931542f268..be654daf4a 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -905,3 +905,41 @@ truncate copy_default;
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
 ERROR:  COPY DEFAULT only available using COPY FROM
+-- Test handling of control characters
+create temp table copy_ctl (a text, b text);
+copy copy_ctl from stdin;
+copy copy_ctl to stdout;
+abc	def
+\n	def
+abc	\n
+\t	def
+abc	\t
+ab\b	\bdef
+a\vbc	def\v
+\f\f\f	g\fg
+\f	def
+abc	
+	def
+abc	
+\\	\n\t\\
+9	9999
+S4	j
+copy copy_ctl to stdout with (format csv);
+abc,def
+"
+",def
+abc,"
+"
+	,def
+abc,	
+ab,def
+abc,def
+,gg
+,def
+abc,
+,def
+abc,
+\,"
+	\"
+9,9999
+S4,j
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..e8d495e024 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -676,3 +676,28 @@ truncate copy_default;
 
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
+
+-- Test handling of control characters
+create temp table copy_ctl (a text, b text);
+
+copy copy_ctl from stdin;
+abc	def
+\n	def
+abc	\n
+\t	def
+abc	\t
+ab\b	\bdef
+a\vbc	def\v
+\f\f\f	g\fg
+\x0c	def
+abc	\x6
+\6	def
+abc	\017
+\\	\
+\	\\
+\9	\9999
+\1234	\j
+\.
+
+copy copy_ctl to stdout;
+copy copy_ctl to stdout with (format csv);
-- 
2.39.3 (Apple Git-146)

v4-0001-Adjust-misleading-comment-placement.patchapplication/octet-stream; name=v4-0001-Adjust-misleading-comment-placement.patchDownload

From 7c8742685dee201f4fc182143dbcc6587f90a8f9 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil@determined.ai>
Date: Sun, 2 Jun 2024 12:01:48 -0400
Subject: [PATCH v4 1/6] Adjust misleading comment placement

---
 src/backend/commands/copyto.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36d7..cd2d7bb217 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -969,15 +969,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldcontext);
 }
 
-/*
- * Send text representation of one attribute, with conversion and escaping
- */
 #define DUMPSOFAR() \
 	do { \
 		if (ptr > start) \
 			CopySendData(cstate, start, ptr - start); \
 	} while (0)
 
+/*
+ * Send text representation of one attribute, with conversion and escaping
+ */
 static void
 CopyAttributeOutText(CopyToState cstate, const char *string)
 {
-- 
2.39.3 (Apple Git-146)

v4-0006-Optimize-COPY-FROM-using-SIMD.patchapplication/octet-stream; name=v4-0006-Optimize-COPY-FROM-using-SIMD.patchDownload

From 3da65874d99d0b7e5fd7c918f8c31d3ebcaef338 Mon Sep 17 00:00:00 2001
From: Neil Conway <neil.conway@gmail.com>
Date: Fri, 7 Jun 2024 11:25:43 -0400
Subject: [PATCH v4 6/6] Optimize COPY FROM using SIMD

CopyReadLineText() scans the COPY input buffer looking for newlines and escape
sequences (as well as quotes in CSV mode). We can use SIMD instructions to
efficiently skip chunks of text that don't contain any interesting
characters. This yields a significant performance improvement for wide rows
(many attributes and/or wide attribute values).
---
 src/backend/commands/copyfromparse.c | 36 ++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 067a33f924..f1f2415bde 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -73,6 +73,7 @@
 #include "nodes/miscnodes.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -1180,6 +1181,7 @@ CopyReadLineText(CopyFromState cstate)
 	bool		need_data = false;
 	bool		hit_eof = false;
 	bool		result = false;
+	Vector8		chunk;
 
 	/* CSV variables */
 	bool		first_char_in_line = true;
@@ -1261,6 +1263,40 @@ CopyReadLineText(CopyFromState cstate)
 			need_data = false;
 		}
 
+		/*
+		 * If there is enough data available, use SIMD to check for special
+		 * characters. This allows us to efficiently skip large chunks of
+		 * text, in the common case that newlines and control characters are
+		 * relatively rare.
+		 */
+		Assert(input_buf_ptr <= cstate->input_buf_len);
+		if (cstate->input_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			vector8_load(&chunk, (const uint8 *) &input_buf[input_buf_ptr]);
+
+			if (!(vector8_has(chunk, (unsigned char) '\\') ||
+				  vector8_has(chunk, (unsigned char) '\r') ||
+				  vector8_has(chunk, (unsigned char) '\n') ||
+				  (cstate->opts.csv_mode && vector8_has(
+					  chunk, (unsigned char) escapec)) ||
+				  (cstate->opts.csv_mode && vector8_has(
+					  chunk, (unsigned char) quotec))))
+			{
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+
+			/*
+			 * Otherwise, proceed byte-by-byte. Note that on subsequent loop
+			 * iterations, because we will only advance input_buf_ptr by a few
+			 * bytes (usually only one), the SIMD checks above will often be
+			 * repeated on an overlapping range of bytes.
+			 *
+			 * TODO: check whether the bookkeeping required to avoid this is
+			 * worth the cost/complexity.
+			 */
+		}
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = input_buf[input_buf_ptr++];
-- 
2.39.3 (Apple Git-146)