Speed up COPY TO text/CSV parsing using SIMD
Hello,
Following Nazir's recommendation to move this to a different thread so it
can be looked at separately.
On Thu, Jan 8, 2026 at 2:49 PM Manni Wood <manni.wood@enterprisedb.com>
wrote:
On Wed, 24 Dec 2025 at 18:08, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
Hello,
Following the same path of optimizing COPY FROM using SIMD, i foundthat COPY TO can also benefit from this.
I attached a small patch that uses SIMD to skip data and advance as
far as the first special character is found, then fallback to scalar
processing for that character and re-enter the SIMD path again...There's two ways to do this:
1) Essentially we do SIMD until we find a special character, thencontinue scalar path without re-entering SIMD again.
- This gives from 10% to 30% speedups depending on the weight of
special characters in the attribute, we don't lose anything here since it
advances with SIMD until it can't (using the previous scripts: 1/3, 2/3
specials chars).2) Do SIMD path, then use scalar path when we hit a special
character, keep re-entering the SIMD path each time.
- This is equivalent to the COPY FROM story, we'll need to find the
same heuristic to use for both COPY FROM/TO to reduce the regressions (same
regressions: around from 20% to 30% with 1/3, 2/3 specials chars).Something else to note is that the scalar path for COPY TO isn't as
heavy as the state machine in COPY FROM.
So if we find the sweet spot for the heuristic, doing the same for
COPY TO will be trivial and always beneficial.
Attached is 0004 which is option 1 (SIMD without re-entering), 0005
is the second one.
Ayoub Kazar, I tested your v4 "copy to" patch, doing everything in RAM,
and using the cpupower tips from above. (I wanted to test your v5, but `git
apply --check` gave me an error, so I can look at that another day.)The results look great:
master: (forgot to get commit hash)
text, no special: 8165
text, 1/3 special: 22662
csv, no special: 9619
csv, 1/3 special: 23213v4 (copy to)
text, no special: 4577 (43.9% speedup)
text, 1/3 special: 22847 (0.8% regression)
csv, no special: 4720 (50.9% speedup)
csv, 1/3 special: 23195 (0.07% regression)Seems like a very clear win to me!
-- Manni Wood EDB: https://www.enterprisedb.com
Currently optimizing COPY FROM using SIMD is still under review, but for
the case of COPY TO using the same ideas, we found that the problem is
trivial, the attached patch gives very nice speedups as confirmed by
Manni's benchmarks.
Regards,
Ayoub
Attachments:
0004-Speed-up-COPY-TO-text-CSV-using-SIMD.patchtext/x-patch; charset=US-ASCII; name=0004-Speed-up-COPY-TO-text-CSV-using-SIMD.patchDownload+126-1
Hi,
On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote:
Currently optimizing COPY FROM using SIMD is still under review, but for
the case of COPY TO using the same ideas, we found that the problem is
trivial, the attached patch gives very nice speedups as confirmed by
Manni's benchmarks.
I have a hard time believing that adding a strlen() to the handling of a short
column won't be a measurable overhead with lots of short attributes.
Particularly because the patch afaict will call it repeatedly if there are any
to-be-escaped characters.
I also don't think it's good how much code this repeats. I think you'd have to
start with preparatory moving the exiting code into static inline helper
functions and then introduce SIMD into those.
Greetings,
Andres Freund
Hi,
On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote:
Currently optimizing COPY FROM using SIMD is still under review, but for
the case of COPY TO using the same ideas, we found that the problem is
trivial, the attached patch gives very nice speedups as confirmed by
Manni's benchmarks.I have a hard time believing that adding a strlen() to the handling of a
short
column won't be a measurable overhead with lots of short attributes.
Particularly because the patch afaict will call it repeatedly if there are
any
to-be-escaped characters.
Thanks for pointing that out, so here's what i did:
1) In the previous patch, strlen was called twice if a CSV attribute needed
to add a quote, the attached patch gets the length in the beginning and
uses it for both SIMD paths, so basically one call.
2) If an attribute needs encoding we need to recalculate string length
because it can grow. (so 2 calls at maximum in all cases)
3) Supposing the very worse cases, i benchmarked this against master for
tables that have 100, 500, 1000 columns : all integers only, so one would
want to process the whole thing in just a pass rather than calculating
length of such short attributes:
1000 columns:
TEXT: 17% regression
CSV: 3.4% regression
500 columns:
TEXT: 17.7% regression
CSV: 3.1% regression
100 columns:
TEXT: 17.3% regression
CSV: 3% regression
A bit unstable results, but yeah the overhead for worse cases like this is
really significant, I can't argue whether this is worth it or not, so
thoughts on this ?
I also don't think it's good how much code this repeats. I think you'd have
to
start with preparatory moving the exiting code into static inline helper
functions and then introduce SIMD into those.
Done, yet i'm not too sure whether this is the right place to put it, let
me know.
Regards,
Ayoub
Attachments:
v2-0001-Speed-up-COPY-TO-text-CSV-using-SIMD.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Speed-up-COPY-TO-text-CSV-using-SIMD.patchDownload+157-3
On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote:
On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
I have a hard time believing that adding a strlen() to the handling of a
short column won't be a measurable overhead with lots of short attributes.
Particularly because the patch afaict will call it repeatedly if there are
any to-be-escaped characters.[...]
1000 columns:
TEXT: 17% regression
CSV: 3.4% regression500 columns:
TEXT: 17.7% regression
CSV: 3.1% regression100 columns:
TEXT: 17.3% regression
CSV: 3% regressionA bit unstable results, but yeah the overhead for worse cases like this is
really significant, I can't argue whether this is worth it or not, so
thoughts on this ?
I seriously doubt we'd commit something that produces a 17% regression
here. Perhaps we should skip the SIMD paths whenever transcoding is
required.
--
nathan
Hello,
On Tue, Mar 10, 2026 at 8:17 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote:
On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de>
wrote:
I have a hard time believing that adding a strlen() to the handling of a
short column won't be a measurable overhead with lots of shortattributes.
Particularly because the patch afaict will call it repeatedly if there
are
any to-be-escaped characters.
[...]
1000 columns:
TEXT: 17% regression
CSV: 3.4% regression500 columns:
TEXT: 17.7% regression
CSV: 3.1% regression100 columns:
TEXT: 17.3% regression
CSV: 3% regressionA bit unstable results, but yeah the overhead for worse cases like this
is
really significant, I can't argue whether this is worth it or not, so
thoughts on this ?I seriously doubt we'd commit something that produces a 17% regression
here. Perhaps we should skip the SIMD paths whenever transcoding is
required.--
nathan
I've spent some time rethinking about this and here's what i've done in v3:
SIMD is only used for varlena attributes whose text representation is
longer than a single SIMD vector, and only when no transcoding is required.
Fixed-size types such as integers etc.. mostly produce short ASCII output
for which SIMD provides no benefit.
For eligible attributes, the stored varlena size is used as a cheap
pre-filter to avoid an
unnecessary strlen() call on short values.
Here are the benchmark results after many runs compared to master
(4deecb52aff):
TEXT clean: -34.0%
CSV clean: -39.3%
TEXT 1/3: +4.7%
CSV 1/3: -2.3%
the above numbers have a variance of 1% to 3% improvs or regressions
across +20 runs
WIDE tables short attributes TEXT:
50 columns: -3.7%
100 columns: -1.7%
200 columns: +1.8%
500 columns: -0.5%
1000 columns: -0.3%
WIDE tables short attributes CSV:
50 columns: -2.5%
100 columns: +1.8%
200 columns: +1.4%
500 columns: -0.9%
1000 columns: -1.1%
Wide tables benchmarks where all similar noise, across +20 runs its always
around -2% and +4% for all numbers of columns.
Just a small concern about where some varlenas have a larger binary size
than its text representation ex:
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
pg_column_size
----------------
32
its text representation is less than sizeof(Vector8) so currently v3 would
enter SIMD path and exit out just from the beginning (two extra branches)
because it does this:
+ if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
+ VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
I thought maybe we could do * 2 or * 4 its binary size, depends on the type
really but this is just a proposition if this case is something concerning.
Thoughts?
Regards,
Ayoub
Attachments:
v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patchDownload+236-19
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
Just a small concern about where some varlenas have a larger binary size
than its text representation ex:
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
pg_column_size
----------------
32its text representation is less than sizeof(Vector8) so currently v3 would enter SIMD path and exit out just from the beginning (two extra branches) because it does this: + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 && + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))I thought maybe we could do * 2 or * 4 its binary size, depends on the type
really but this is just a proposition if this case is something concerning.
Can we measure the impact of this? How likely is this case?
+static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string, + bool use_simd, size_t len); +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string, + bool use_quote, bool use_simd, size_t len);
Can you test this on its own, too? We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.
if (is_csv) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); + { + if (use_simd) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1], + true, len); + else + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1], + false, len); + } else - CopyAttributeOutText(cstate, string); + { + if (use_simd) + CopyAttributeOutText(cstate, string, true, len); + else + CopyAttributeOutText(cstate, string, false, len); + }
There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference. As above, it would
be good to measure it.
--
nathan
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
Just a small concern about where some varlenas have a larger binary size
than its text representation ex:
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
pg_column_size
----------------
32its text representation is less than sizeof(Vector8) so currently v3
would
enter SIMD path and exit out just from the beginning (two extra branches) because it does this: + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 && + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))I thought maybe we could do * 2 or * 4 its binary size, depends on the
type
really but this is just a proposition if this case is something
concerning.
Can we measure the impact of this? How likely is this case?
I'll respond to this separately in a different email.
+static pg_attribute_always_inline void CopyAttributeOutText(CopyToState
cstate, const char *string,
+
bool use_simd, size_t len);
+static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState
cstate, const char *string,
+
bool use_quote, bool use_simd,
size_t len);Can you test this on its own, too? We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.
I tested inlining alone and found the results were about an improvement of
1% to 4% across all configurations.
The inlining is only meaningful in combination with the SIMD work, for the
reason described below.
if (is_csv)
- CopyAttributeOutCSV(cstate, string,
-cstate->opts.force_quote_flags[attnum - 1]);
+ { + if (use_simd) + CopyAttributeOutCSV(cstate, string, +cstate->opts.force_quote_flags[attnum - 1],
+
true, len);
+ else + CopyAttributeOutCSV(cstate, string, +cstate->opts.force_quote_flags[attnum - 1],
+
false, len);
There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference. As above, it would
be good to measure it
I compiled three variants
v3: use_simd passed as compile-time, CopyAttribute functions inlined.
v3_variable: use_simd as is variable, CopyAttribute functions inlined.
v3_variable_noinline: use_simd as is variable, CopyAttribute functions are
not inlined.
None of the helpers are explicitly inlined by us.
The assembly reveals two things:
1) The CSV SIMD helpers (CopyCheckCSVQuoteNeedSIMD, CopySkipCSVEscapeSIMD)
are inlined by the compiler naturally in all
three variants, CopySkipTextSIMD is never inlined by the compiler in any
variant.
2) The constant-emitting approach (v3) does matter (just a little
apparently) specifically for CopySkipTextSIMD.
Its the same story as COPY FROM patch's first commit it just emits code
without use_simd branch
jbe ... ; len > sizeof(Vector8)
je ... ; need_transcoding
call CopySkipTextSIMD
Whether the extra branching in for constant passing is worth it or not is
demonstrated by the benchmark.
Test Master v3 v3_var v3_var_noinl
TEXT clean 1504ms -24.1% -23.0% -21.5%
CSV clean 1760ms -34.9% -32.7% -33.0%
TEXT 1/3 backslashes 3763ms +4.6% +6.9% +4.1%
CSV 1/3 quotes 3885ms +3.1% +2.7% -0.8%
Wide table TEXT (integer columns):
Cols Master v3 v3_var v3_var_noinl
50 2083ms -0.7% -0.6% +3.5%
100 4094ms -0.1% -0.5% +4.5%
200 1560ms +0.6% -2.3% +3.2%
500 1905ms -1.0% -1.3% +4.7%
1000 1455ms +1.8% +0.4% +4.3%
Wide table CSV:
Cols Master v3 v3_var v3_var_noinl
50 2421ms +4.0% +6.7% +5.8%
100 4980ms +0.1% +2.0% +0.1%
200 1901ms +1.4% +3.5% +1.4%
500 2328ms +1.8% +2.7% +2.2%
1000 1815ms +2.0% +2.8% +2.5%
I'm not sure whether there's a diff between v3 and v3_var practically
speaking, what do you think ?
Regards,
Ayoub
On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
Just a small concern about where some varlenas have a larger binary size
than its text representation ex:
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
pg_column_size
----------------
32its text representation is less than sizeof(Vector8) so currently v3
would
enter SIMD path and exit out just from the beginning (two extra
branches)
because it does this: + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 && + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))I thought maybe we could do * 2 or * 4 its binary size, depends on the
type
really but this is just a proposition if this case is something
concerning.
Can we measure the impact of this? How likely is this case?
I'll respond to this separately in a different email.
My example was already incorrect (the text representation is lexems and
positions, not the text we read as it is, its lossy), anyways the point
still holds.
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
CSV format this would immediately exit the SIMD path because of quote
character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million times for
one COPY TO command), i only found 3% regression for this case, sometimes
2%.
For cases where we do a false commitment on SIMD because we read a binary
size >= sizeof(Vector8), which i found very niche too, the short circuit to
scalar each time is even more negligible (the above CSV JSON case is the
absolute worst case).
So I don't think any of this should be a concern.
Regards,
Ayoub
On Wed, Mar 18, 2026 at 3:29 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
Just a small concern about where some varlenas have a larger binary
size
than its text representation ex:
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
pg_column_size
----------------
32its text representation is less than sizeof(Vector8) so currently v3
would
enter SIMD path and exit out just from the beginning (two extra
branches)
because it does this: + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 && + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))I thought maybe we could do * 2 or * 4 its binary size, depends on the
type
really but this is just a proposition if this case is something
concerning.
Can we measure the impact of this? How likely is this case?
I'll respond to this separately in a different email.
My example was already incorrect (the text representation is lexems and
positions, not the text we read as it is, its lossy), anyways the point
still holds.
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
CSV format this would immediately exit the SIMD path because of quote
character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million times
for one COPY TO command), i only found 3% regression for this case,
sometimes 2%.For cases where we do a false commitment on SIMD because we read a binary
size >= sizeof(Vector8), which i found very niche too, the short circuit to
scalar each time is even more negligible (the above CSV JSON case is the
absolute worst case).
So I don't think any of this should be a concern.Regards,
Ayoub
Rebased patch.
Regards,
Ayoub
Attachments:
v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patchDownload+236-19
On Wed, Mar 18, 2026 at 12:02:28AM +0100, KAZAR Ayoub wrote:
Test Master v3 v3_var v3_var_noinl
TEXT clean 1504ms -24.1% -23.0% -21.5%
CSV clean 1760ms -34.9% -32.7% -33.0%
Nice!
TEXT 1/3 backslashes 3763ms +4.6% +6.9% +4.1%
CSV 1/3 quotes 3885ms +3.1% +2.7% -0.8%
Hm. These seem a little bit beyond what we could ignore as noise.
Wide table TEXT (integer columns):
Cols Master v3 v3_var v3_var_noinl
50 2083ms -0.7% -0.6% +3.5%
100 4094ms -0.1% -0.5% +4.5%
200 1560ms +0.6% -2.3% +3.2%
500 1905ms -1.0% -1.3% +4.7%
1000 1455ms +1.8% +0.4% +4.3%
These numbers look roughly within the noise range.
Wide table CSV:
Cols Master v3 v3_var v3_var_noinl
50 2421ms +4.0% +6.7% +5.8%
Hm. Is this reproducible? A 4% regression is a bit worrisome.
100 4980ms +0.1% +2.0% +0.1%
200 1901ms +1.4% +3.5% +1.4%
500 2328ms +1.8% +2.7% +2.2%
1000 1815ms +2.0% +2.8% +2.5%
These numbers don't bother me too much, but maybe there are some ways to
minimize the regressions further.
--
nathan
On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
CSV format this would immediately exit the SIMD path because of quote
character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million times for
one COPY TO command), i only found 3% regression for this case, sometimes
2%.
I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads. Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes. The extra branching for each attribute
might not be something we can just ignore.
For cases where we do a false commitment on SIMD because we read a binary
size >= sizeof(Vector8), which i found very niche too, the short circuit to
scalar each time is even more negligible (the above CSV JSON case is the
absolute worst case).
That's good to hear.
--
nathan
Hello,
On Thu, Mar 26, 2026 at 10:23 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
CSV format this would immediately exit the SIMD path because of quote
character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million timesfor
one COPY TO command), i only found 3% regression for this case, sometimes
2%.I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads. Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes. The extra branching for each attribute
might not be something we can just ignore.
Thanks for the review.
I added a prescan loop inside the simd helpers trying to catch special
chars in sizeof(Vector8) characters, i measured how good is this at
reducing the overhead of starting simd and exiting at first vector:
the scalar loop is better than SIMD for one vector if it finds a special
character before 6th character, worst case is not a clean vector, where the
scalar loop needs 20 more cycles compared to SIMD.
This helps mitigate the case of JSON(B) in CSV format, this is why I only
added this for CSV case only.
In a benchmark with 10M early SIMD exit like the JSONB case, the previous
3% regression is gone.
For the normal benchmark (clean, 1/3 specials, wide table), i ran for
longer times for v4 now and i found this:
Test Master V4
TEXT clean 1619ms -28.0%
CSV clean 1866ms -37.1%
TEXT 1/3 backslashes 3913ms +1.2%
CSV 1/3 quotes 4012ms -3.0%
Wide table TEXT:
Cols Master V4
50 2109ms -2.9%
100 2029ms -1.6%
200 3982ms -2.9%
500 1962ms -6.1%
1000 3812ms -3.6%
Wide table CSV:
Cols Master V4
50 2531ms +0.3%
100 2465ms +1.1%
200 4965ms -0.2%
500 2346ms +1.4%
1000 4709ms -0.4%
Do we need more benchmarks for some other kind of workloads ? If i'm
missing something else that has noticeable overhead maybe ?
Regards,
Ayoub
Attachments:
v4-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patchDownload+262-19
On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
I added a prescan loop inside the simd helpers trying to catch special
chars in sizeof(Vector8) characters, i measured how good is this at
reducing the overhead of starting simd and exiting at first vector:
the scalar loop is better than SIMD for one vector if it finds a special
character before 6th character, worst case is not a clean vector, where the
scalar loop needs 20 more cycles compared to SIMD.
This helps mitigate the case of JSON(B) in CSV format, this is why I only
added this for CSV case only.
Interesting.
In a benchmark with 10M early SIMD exit like the JSONB case, the previous
3% regression is gone.
While these are nice results, I think it's best that we target v20 for this
patch so that we have more time to benchmark and explore edge cases.
--
nathan
On Tue, Mar 31, 2026 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
I added a prescan loop inside the simd helpers trying to catch special
chars in sizeof(Vector8) characters, i measured how good is this at
reducing the overhead of starting simd and exiting at first vector:
the scalar loop is better than SIMD for one vector if it finds a special
character before 6th character, worst case is not a clean vector, wherethe
scalar loop needs 20 more cycles compared to SIMD.
This helps mitigate the case of JSON(B) in CSV format, this is why I only
added this for CSV case only.Interesting.
In a benchmark with 10M early SIMD exit like the JSONB case, the previous
3% regression is gone.While these are nice results, I think it's best that we target v20 for this
patch so that we have more time to benchmark and explore edge cases.
Thanks for the review.
Fair enough, I'll try many more cases in the upcoming weeks to make sure
we're not missing anything.
--
nathan
Regards,
Ayoub