[POC] verifying UTF-8 using SIMD instructions

Started by John Naylorabout 5 years ago80 messageshackers
Jump to latest
#1John Naylor
john.naylor@enterprisedb.com

Hi,

As of b80e10638e3, there is a new API for validating the encoding of
strings, and one of the side effects is that we have a wider choice of
algorithms. For UTF-8, it has been demonstrated that SIMD is much faster at
decoding [1]https://woboq.com/blog/utf-8-processing-using-simd.html and validation [2]https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/ than the standard approach we use.

It makes sense to start with the ascii subset of UTF-8 for a couple
reasons. First, ascii is very widespread in database content, particularly
in bulk loads. Second, ascii can be validated using the simple SSE2
intrinsics that come with (I believe) any x64-64 chip, and I'm guessing we
can detect that at compile time and not mess with runtime checks. The
examples above using SSE for the general case are much more complicated and
involve SSE 4.2 or AVX.

Here are some numbers on my laptop (MacOS/clang 10 -- if the concept is
okay, I'll do Linux/gcc and add more inputs). The test is the same as
Heikki shared in [3]/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi, but I added a case with >95% Chinese characters just
to show how that compares to the mixed ascii/multibyte case.

master:

chinese | mixed | ascii
---------+-------+-------
1081 | 761 | 366

patch:

chinese | mixed | ascii
---------+-------+-------
1103 | 498 | 51

The speedup in the pure ascii case is nice.

In the attached POC, I just have a pro forma portability stub, and left
full portability detection for later. The fast path is inlined inside
pg_utf8_verifystr(). I imagine the ascii fast path could be abstracted into
a separate function to which is passed a function pointer for full encoding
validation. That would allow other encodings with strict ascii subsets to
use this as well, but coding that abstraction might be a little messy, and
b80e10638e3 already gives a performance boost over PG13.

I also gave a shot at doing full UTF-8 recognition using a DFA, but so far
that has made performance worse. If I ever have more success with that,
I'll add that in the mix.

[1]: https://woboq.com/blog/utf-8-processing-using-simd.html
[2]: https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/
https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/
[3]: /messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi
/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v1-verify-utf8-sse-ascii.patchapplication/x-patch; name=v1-verify-utf8-sse-ascii.patchDownload+81-2
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#1)
Re: [POC] verifying UTF-8 using SIMD instructions

On 01/02/2021 19:32, John Naylor wrote:

It makes sense to start with the ascii subset of UTF-8 for a couple
reasons. First, ascii is very widespread in database content,
particularly in bulk loads. Second, ascii can be validated using the
simple SSE2 intrinsics that come with (I believe) any x64-64 chip, and
I'm guessing we can detect that at compile time and not mess with
runtime checks. The examples above using SSE for the general case are
much more complicated and involve SSE 4.2 or AVX.

I wonder how using SSE compares with dealing with 64 or 32-bit words at
a time, using regular instructions? That would be more portable.

Here are some numbers on my laptop (MacOS/clang 10 -- if the concept is
okay, I'll do Linux/gcc and add more inputs). The test is the same as
Heikki shared in [3], but I added a case with >95% Chinese characters
just to show how that compares to the mixed ascii/multibyte case.

master:

 chinese | mixed | ascii
---------+-------+-------
    1081 |   761 |   366

patch:

 chinese | mixed | ascii
---------+-------+-------
    1103 |   498 |    51

The speedup in the pure ascii case is nice.

Yep.

In the attached POC, I just have a pro forma portability stub, and left
full portability detection for later. The fast path is inlined inside
pg_utf8_verifystr(). I imagine the ascii fast path could be abstracted
into a separate function to which is passed a function pointer for full
encoding validation. That would allow other encodings with strict ascii
subsets to use this as well, but coding that abstraction might be a
little messy, and b80e10638e3 already gives a performance boost over PG13.

All supported encodings are ASCII subsets. Might be best to putt the
ASCII-check into a static inline function and use it in all the verify
functions. I presume it's only a few instructions, and these functions
can be pretty performance sensitive.

I also gave a shot at doing full UTF-8 recognition using a DFA, but so
far that has made performance worse. If I ever have more success with
that, I'll add that in the mix.

That's disappointing. Perhaps the SIMD algorithms have higher startup
costs, so that you need longer inputs to benefit? In that case, it might
make sense to check the length of the input and only use the SIMD
algorithm if the input is long enough.

- Heikki

#3John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#2)
Re: [POC] verifying UTF-8 using SIMD instructions

On Mon, Feb 1, 2021 at 2:01 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 01/02/2021 19:32, John Naylor wrote:

It makes sense to start with the ascii subset of UTF-8 for a couple
reasons. First, ascii is very widespread in database content,
particularly in bulk loads. Second, ascii can be validated using the
simple SSE2 intrinsics that come with (I believe) any x64-64 chip, and
I'm guessing we can detect that at compile time and not mess with
runtime checks. The examples above using SSE for the general case are
much more complicated and involve SSE 4.2 or AVX.

I wonder how using SSE compares with dealing with 64 or 32-bit words at
a time, using regular instructions? That would be more portable.

I gave that a shot, and it's actually pretty good. According to this paper,
[1]: https://arxiv.org/abs/2010.03090
registers, so I tried both 16 and 8 bytes.

All supported encodings are ASCII subsets. Might be best to putt the
ASCII-check into a static inline function and use it in all the verify
functions. I presume it's only a few instructions, and these functions
can be pretty performance sensitive.

I tried both the static inline function and also putting the whole
optimized utf-8 loop in a separate function to which the caller passes a
pointer to the appropriate pg_*_verifychar().

In the table below, "inline" refers to coding directly inside
pg_utf8_verifystr(). Both C and SSE are in the same patch, with an #ifdef.
I didn't bother splitting them out because for other encodings, we want one
of the other approaches above. For those, "C retail" refers to a static
inline function to code the contents of the inner loop, if I understood
your suggestion correctly. This needs more boilerplate in each function, so
I don't prefer this. "C func pointer" refers to the pointer approach I just
mentioned. That is the cleanest looking way to generalize it, so I only
tested that version with different strides -- 8- and 16-bytes

This is the same test I used earlier, which is the test in [2]/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi but adding
an almost-pure multibyte Chinese text of about the same size.

x64-64 Linux gcc 8.4.0:

build | chinese | mixed | ascii
------------------+---------+-------+-------
master | 1480 | 848 | 428
inline SSE | 1617 | 634 | 63
inline C | 1481 | 843 | 50
C retail | 1493 | 838 | 49
C func pointer | 1467 | 851 | 49
C func pointer 8 | 1518 | 757 | 56

x64-64 MacOS clang 10.0.0:

build | chinese | mixed | ascii
------------------+---------+-------+-------
master | 1086 | 760 | 374
inline SSE | 1081 | 529 | 70
inline C | 1093 | 649 | 49
C retail | 1132 | 695 | 152
C func pointer | 1085 | 609 | 59
C func pointer 8 | 1099 | 571 | 71

PowerPC-LE Linux gcc 4.8.5:

build | chinese | mixed | ascii
------------------+---------+-------+-------
master | 2961 | 1525 | 871
inline SSE | (n/a) | (n/a) | (n/a)
inline C | 2911 | 1329 | 80
C retail | 2838 | 1311 | 102
C func pointer | 2828 | 1314 | 80
C func pointer 8 | 3143 | 1249 | 133

Looking at the results, the main advantage of SSE here is it's more robust
for mixed inputs. If a 16-byte chunk is not ascii-only but contains a block
of ascii at the front, we can skip those with a single CPU instruction, but
in C, we have to verify the whole chunk using the slow path.

The "C func pointer approach" seems to win out over the "C retail" approach
(static inline function).

Using an 8-byte stride is slightly better for mixed inputs on all platforms
tested, but regresses on pure ascii and also seems to regress on pure
multibyte. The difference in the multibyte caes is small enough that it
could be random, but it happens on two platforms, so I'd say it's real. On
the other hand, pure multibyte is not as common as mixed text.

Overall, I think the function pointer approach with an 8-byte stride is the
best balance. If that's agreeable, next I plan to test with short inputs,
because I think we'll want a guard if-statement to only loop through the
fast path if the string is long enough to justify that.

I also gave a shot at doing full UTF-8 recognition using a DFA, but so
far that has made performance worse. If I ever have more success with
that, I'll add that in the mix.

That's disappointing. Perhaps the SIMD algorithms have higher startup
costs, so that you need longer inputs to benefit? In that case, it might
make sense to check the length of the input and only use the SIMD
algorithm if the input is long enough.

I changed topics a bit quickly, but here I'm talking about using a
table-driven state machine to verify the multibyte case. It's possible I
did something wrong, since my model implementation decodes, and having to
keep track of how many bytes got verified might be the culprit. I'd like to
try again to speed up multibyte, but that might be a PG15 project.

[1]: https://arxiv.org/abs/2010.03090
[2]: /messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi
/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi

--
John Naylor
EDB: http://www.enterprisedb.com

#4John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#3)
Re: [POC] verifying UTF-8 using SIMD instructions

Here is a more polished version of the function pointer approach, now
adapted to all multibyte encodings. Using the not-yet-committed tests from
[1]: /messages/by-id/11d39e63-b80a-5f8d-8043-fff04201fadc@iki.fi
only be wrong, but probably also elided by the compiler. Doing it correctly
is noticeably slower on pure ascii, but still several times faster than
before, so the conclusions haven't changed any. I'll run full measurements
later this week, but I'll share the patch now for review.

[1]: /messages/by-id/11d39e63-b80a-5f8d-8043-fff04201fadc@iki.fi
/messages/by-id/11d39e63-b80a-5f8d-8043-fff04201fadc@iki.fi

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patchapplication/octet-stream; name=v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patchDownload+159-23
#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#4)
Re: [POC] verifying UTF-8 using SIMD instructions

On 07/02/2021 22:24, John Naylor wrote:

Here is a more polished version of the function pointer approach, now
adapted to all multibyte encodings. Using the not-yet-committed tests
from [1], I found a thinko bug that resulted in the test for nul bytes
to not only be wrong, but probably also elided by the compiler. Doing it
correctly is noticeably slower on pure ascii, but still several times
faster than before, so the conclusions haven't changed any. I'll run
full measurements later this week, but I'll share the patch now for review.

As a quick test, I hacked up pg_utf8_verifystr() to use Lemire's
algorithm from the simdjson library [1]https://github.com/simdjson/simdjson, see attached patch. I
microbenchmarked it using the the same test I used before [2]/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi.

These results are with "gcc -O2" using "gcc (Debian 10.2.1-6) 10.2.1
20210110"

unpatched master:

postgres=# \i mbverifystr-speed.sql
CREATE FUNCTION
mixed | ascii
-------+-------
728 | 393
(1 row)

v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patch:

mixed | ascii
-------+-------
759 | 98
(1 row)

simdjson-utf8-hack.patch:

mixed | ascii
-------+-------
53 | 31
(1 row)

So clearly that algorithm is fast. Not sure if it has a high startup
cost, or large code size, or other tradeoffs that we don't want. At
least it depends on SIMD instructions, so it requires more code for the
architecture-specific implementations and autoconf logic and all that.
Nevertheless I think it deserves a closer look, I'm a bit reluctant to
put in half-way measures, when there's a clearly superior algorithm out
there.

I also tested the fallback implementation from the simdjson library
(included in the patch, if you uncomment it in simdjson-glue.c):

mixed | ascii
-------+-------
447 | 46
(1 row)

I think we should at least try to adopt that. At a high level, it looks
pretty similar your patch: you load the data 8 bytes at a time, check if
there are all ASCII. If there are any non-ASCII chars, you check the
bytes one by one, otherwise you load the next 8 bytes. Your patch should
be able to achieve the same performance, if done right. I don't think
the simdjson code forbids \0 bytes, so that will add a few cycles, but
still.

[1]: https://github.com/simdjson/simdjson
[2]: /messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi
/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi

- Heikki

PS. Your patch as it stands isn't safe on systems with strict alignment,
the string passed to the verify function isn't guaranteed to be 8 bytes
aligned. Use memcpy to fetch the next 8-byte chunk to fix.

Attachments:

simdjson-utf8-hack.patchtext/x-patch; charset=UTF-8; name=simdjson-utf8-hack.patchDownload+118-2
#6John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#5)
Re: [POC] verifying UTF-8 using SIMD instructions

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

As a quick test, I hacked up pg_utf8_verifystr() to use Lemire's
algorithm from the simdjson library [1], see attached patch. I
microbenchmarked it using the the same test I used before [2].

I've been looking at various iterations of Lemire's utf8 code, and trying
it out was next on my list, so thanks for doing that!

These results are with "gcc -O2" using "gcc (Debian 10.2.1-6) 10.2.1
20210110"

unpatched master:

postgres=# \i mbverifystr-speed.sql
CREATE FUNCTION
mixed | ascii
-------+-------
728 | 393
(1 row)

v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patch:

mixed | ascii
-------+-------
759 | 98
(1 row)

Hmm, the mixed case got worse -- I haven't seen that in any of my tests.

simdjson-utf8-hack.patch:

mixed | ascii
-------+-------
53 | 31
(1 row)

So clearly that algorithm is fast. Not sure if it has a high startup
cost, or large code size, or other tradeoffs that we don't want.

The simdjson lib uses everything up through AVX512 depending on what
hardware is available. I seem to remember reading that high start-up cost
is more relevant to floating point than to integer ops, but I could be
wrong. Just the utf8 portion is surely tiny also.

At
least it depends on SIMD instructions, so it requires more code for the
architecture-specific implementations and autoconf logic and all that.

One of his earlier demos [1]https://github.com/lemire/fastvalidate-utf-8/tree/master/include (in simdutf8check.h) had a version that used
mostly SSE2 with just three intrinsics from SSSE3. That's widely available
by now. He measured that at 0.7 cycles per byte, which is still good
compared to AVX2 0.45 cycles per byte [2]https://lemire.me/blog/2018/10/19/validating-utf-8-bytes-using-only-0-45-cycles-per-byte-avx-edition/.

Testing for three SSSE3 intrinsics in autoconf is pretty easy. I would
assume that if that check (and the corresponding runtime check) passes, we
can assume SSE2. That code has three licenses to choose from -- Apache 2,
Boost, and MIT. Something like that might be straightforward to start
from. I think the only obstacles to worry about are license and getting it
to fit into our codebase. Adding more than zero high-level comments with a
good description of how it works in detail is also a bit of a challenge.

I also tested the fallback implementation from the simdjson library
(included in the patch, if you uncomment it in simdjson-glue.c):

mixed | ascii
-------+-------
447 | 46
(1 row)

I think we should at least try to adopt that. At a high level, it looks
pretty similar your patch: you load the data 8 bytes at a time, check if
there are all ASCII. If there are any non-ASCII chars, you check the
bytes one by one, otherwise you load the next 8 bytes. Your patch should
be able to achieve the same performance, if done right. I don't think
the simdjson code forbids \0 bytes, so that will add a few cycles, but
still.

Okay, I'll look into that.

PS. Your patch as it stands isn't safe on systems with strict alignment,
the string passed to the verify function isn't guaranteed to be 8 bytes
aligned. Use memcpy to fetch the next 8-byte chunk to fix.

Will do.

[1]: https://github.com/lemire/fastvalidate-utf-8/tree/master/include
[2]: https://lemire.me/blog/2018/10/19/validating-utf-8-bytes-using-only-0-45-cycles-per-byte-avx-edition/
https://lemire.me/blog/2018/10/19/validating-utf-8-bytes-using-only-0-45-cycles-per-byte-avx-edition/

--
John Naylor
EDB: http://www.enterprisedb.com

#7John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#5)
Re: [POC] verifying UTF-8 using SIMD instructions

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I also tested the fallback implementation from the simdjson library
(included in the patch, if you uncomment it in simdjson-glue.c):

mixed | ascii
-------+-------
447 | 46
(1 row)

I think we should at least try to adopt that. At a high level, it looks
pretty similar your patch: you load the data 8 bytes at a time, check if
there are all ASCII. If there are any non-ASCII chars, you check the
bytes one by one, otherwise you load the next 8 bytes. Your patch should
be able to achieve the same performance, if done right. I don't think
the simdjson code forbids \0 bytes, so that will add a few cycles, but
still.

That fallback is very similar to my "inline C" case upthread, and they both
actually check 16 bytes at a time (the comment is wrong in the patch you
shared). I can work back and show how the performance changes with each
difference (just MacOS, clang 10 here):

master

mixed | ascii
-------+-------
757 | 366

v1, but using memcpy()

mixed | ascii
-------+-------
601 | 129

remove zero-byte check:

mixed | ascii
-------+-------
588 | 93

inline ascii fastpath into pg_utf8_verifystr()

mixed | ascii
-------+-------
595 | 71

use 16-byte stride

mixed | ascii
-------+-------
652 | 49

With this cpu/compiler, v1 is fastest on the mixed input all else being
equal.

Maybe there's a smarter way to check for zeros in C. Or maybe be more
careful about cache -- running memchr() on the whole input first might not
be the best thing to do.

--
John Naylor
EDB: http://www.enterprisedb.com

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#7)
Re: [POC] verifying UTF-8 using SIMD instructions

On 09/02/2021 22:08, John Naylor wrote:

Maybe there's a smarter way to check for zeros in C. Or maybe be more
careful about cache -- running memchr() on the whole input first might
not be the best thing to do.

The usual trick is the haszero() macro here:
https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. That's
how memchr() is typically implemented, too.

- Heikki

#9John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#6)
Re: [POC] verifying UTF-8 using SIMD instructions

I wrote:

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
One of his earlier demos [1] (in simdutf8check.h) had a version that used

mostly SSE2 with just three intrinsics from SSSE3. That's widely available
by now. He measured that at 0.7 cycles per byte, which is still good
compared to AVX2 0.45 cycles per byte [2].

Testing for three SSSE3 intrinsics in autoconf is pretty easy. I would

assume that if that check (and the corresponding runtime check) passes, we
can assume SSE2. That code has three licenses to choose from -- Apache 2,
Boost, and MIT. Something like that might be straightforward to start from.
I think the only obstacles to worry about are license and getting it to fit
into our codebase. Adding more than zero high-level comments with a good
description of how it works in detail is also a bit of a challenge.

I double checked, and it's actually two SSSE3 intrinsics and one SSE4.1,
but the 4.1 one can be emulated with a few SSE2 intrinsics. But we could
probably fold all three into the SSE4.2 CRC check and have a single symbol
to save on boilerplate.

I hacked that demo [1]https://github.com/lemire/fastvalidate-utf-8/tree/master/include into wchar.c (very ugly patch attached), and got the
following:

master

mixed | ascii
-------+-------
757 | 366

Lemire demo:

mixed | ascii
-------+-------
172 | 168

This one lacks an ascii fast path, but the AVX2 version in the same file
has one that could probably be easily adapted. With that, I think this
would be worth adapting to our codebase and license. Thoughts?

The advantage of this demo is that it's not buried in a mountain of modern
C++.

Simdjson can use AVX -- do you happen to know which target it got compiled
to? AVX vectors are 256-bits wide and that requires OS support. The OS's we
care most about were updated 8-12 years ago, but that would still be
something to check, in addition to more configure checks.

[1]: https://github.com/lemire/fastvalidate-utf-8/tree/master/include

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

utf-sse42-demo.patchapplication/octet-stream; name=utf-sse42-demo.patchDownload+174-0
#10John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#8)
Re: [POC] verifying UTF-8 using SIMD instructions

On Tue, Feb 9, 2021 at 4:22 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 09/02/2021 22:08, John Naylor wrote:

Maybe there's a smarter way to check for zeros in C. Or maybe be more
careful about cache -- running memchr() on the whole input first might
not be the best thing to do.

The usual trick is the haszero() macro here:
https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. That's
how memchr() is typically implemented, too.

Thanks for that. Checking with that macro each loop iteration gives a small
boost:

v1, but using memcpy()

mixed | ascii
-------+-------
601 | 129

with haszero()

mixed | ascii
-------+-------
583 | 105

remove zero-byte check:

mixed | ascii
-------+-------
588 | 93

--
John Naylor
EDB: http://www.enterprisedb.com

#11John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#5)
Re: [POC] verifying UTF-8 using SIMD instructions

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I also tested the fallback implementation from the simdjson library
(included in the patch, if you uncomment it in simdjson-glue.c):

mixed | ascii
-------+-------
447 | 46
(1 row)

I think we should at least try to adopt that. At a high level, it looks
pretty similar your patch: you load the data 8 bytes at a time, check if
there are all ASCII. If there are any non-ASCII chars, you check the
bytes one by one, otherwise you load the next 8 bytes. Your patch should
be able to achieve the same performance, if done right. I don't think
the simdjson code forbids \0 bytes, so that will add a few cycles, but
still.

Attached is a patch that does roughly what simdjson fallback did, except I
use straight tests on the bytes and only calculate code points in assertion
builds. In the course of doing this, I found that my earlier concerns about
putting the ascii check in a static inline function were due to my
suboptimal loop implementation. I had assumed that if the chunked ascii
check failed, it had to check all those bytes one at a time. As it turns
out, that's a waste of the branch predictor. In the v2 patch, we do the
chunked ascii check every time we loop. With that, I can also confirm the
claim in the Lemire paper that it's better to do the check on 16-byte
chunks:

(MacOS, Clang 10)

master:

chinese | mixed | ascii
---------+-------+-------
1081 | 761 | 366

v2 patch, with 16-byte stride:

chinese | mixed | ascii
---------+-------+-------
806 | 474 | 83

patch but with 8-byte stride:

chinese | mixed | ascii
---------+-------+-------
792 | 490 | 105

I also included the fast path in all other multibyte encodings, and that is
also pretty good performance-wise. It regresses from master on pure
multibyte input, but that case is still faster than PG13, which I simulated
by reverting 6c5576075b0f9 and b80e10638e3:

~PG13:

chinese | mixed | ascii
---------+-------+-------
1565 | 848 | 365

ascii fast-path plus pg_*_verifychar():

chinese | mixed | ascii
---------+-------+-------
1279 | 656 | 94

v2 has a rough start to having multiple implementations in
src/backend/port. Next steps are:

1. Add more tests for utf-8 coverage (in addition to the ones to be added
by the noError argument patch)
2. Add SSE4 validator -- it turns out the demo I referred to earlier
doesn't match the algorithm in the paper. I plan to only copy the lookup
tables from simdjson verbatim, but the code will basically be written from
scratch, using simdjson as a hint.
3. Adjust configure.ac

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v2-add-portability-stub-and-new-fallback.patchapplication/octet-stream; name=v2-add-portability-stub-and-new-fallback.patchDownload+425-25
#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#11)
Re: [POC] verifying UTF-8 using SIMD instructions

On 13/02/2021 03:31, John Naylor wrote:

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi
<mailto:hlinnaka@iki.fi>> wrote:

I also tested the fallback implementation from the simdjson library
(included in the patch, if you uncomment it in simdjson-glue.c):

  mixed | ascii
-------+-------
    447 |    46
(1 row)

I think we should at least try to adopt that. At a high level, it looks
pretty similar your patch: you load the data 8 bytes at a time, check if
there are all ASCII. If there are any non-ASCII chars, you check the
bytes one by one, otherwise you load the next 8 bytes. Your patch should
be able to achieve the same performance, if done right. I don't think
the simdjson code forbids \0 bytes, so that will add a few cycles, but
still.

Attached is a patch that does roughly what simdjson fallback did, except
I use straight tests on the bytes and only calculate code points in
assertion builds. In the course of doing this, I found that my earlier
concerns about putting the ascii check in a static inline function were
due to my suboptimal loop implementation. I had assumed that if the
chunked ascii check failed, it had to check all those bytes one at a
time. As it turns out, that's a waste of the branch predictor. In the v2
patch, we do the chunked ascii check every time we loop. With that, I
can also confirm the claim in the Lemire paper that it's better to do
the check on 16-byte chunks:

(MacOS, Clang 10)

master:

 chinese | mixed | ascii
---------+-------+-------
    1081 |   761 |   366

v2 patch, with 16-byte stride:

 chinese | mixed | ascii
---------+-------+-------
     806 |   474 |    83

patch but with 8-byte stride:

 chinese | mixed | ascii
---------+-------+-------
     792 |   490 |   105

I also included the fast path in all other multibyte encodings, and that
is also pretty good performance-wise.

Cool.

It regresses from master on pure
multibyte input, but that case is still faster than PG13, which I
simulated by reverting 6c5576075b0f9 and b80e10638e3:

I thought the "chinese" numbers above are pure multibyte input, and it
seems to do well on that. Where does it regress? In multibyte encodings
other than UTF-8? How bad is the regression?

I tested this on my first generation Raspberry Pi (chipmunk). I had to
tweak it a bit to make it compile, since the SSE autodetection code was
not finished yet. And I used generate_series(1, 1000) instead of
generate_series(1, 10000) in the test script (mbverifystr-speed.sql)
because this system is so slow.

master:

mixed | ascii
-------+-------
1310 | 1041
(1 row)

v2-add-portability-stub-and-new-fallback.patch:

mixed | ascii
-------+-------
2979 | 910
(1 row)

I'm guessing that's because the unaligned access in check_ascii() is
expensive on this platform.

- Heikki

#13John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#12)
Re: [POC] verifying UTF-8 using SIMD instructions

On Mon, Feb 15, 2021 at 9:18 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Attached is the first attempt at using SSE4 to do the validation, but first
I'll answer your questions about the fallback.

I should mention that v2 had a correctness bug for 4-byte characters that I
found when I was writing regression tests. It shouldn't materially affect
performance, however.

I thought the "chinese" numbers above are pure multibyte input, and it
seems to do well on that. Where does it regress? In multibyte encodings
other than UTF-8?

Yes, the second set of measurements was intended to represent multibyte
encodings other than UTF-8. But instead of using one of those encodings, I
simulated non-UTF-8 by copying the pattern used for those: in the loop,
check for ascii then either advance or verify one character. It was a quick
way to use the same test.

How bad is the regression?

I'll copy the measurements here together with master so it's easier to
compare:

~= PG13 (revert 6c5576075b0f9 and b80e10638e3):

chinese | mixed | ascii
---------+-------+-------
1565 | 848 | 365

master:

chinese | mixed | ascii
---------+-------+-------
1081 | 761 | 366

ascii fast-path plus pg_*_verifychar():

chinese | mixed | ascii
---------+-------+-------
1279 | 656 | 94

As I mentioned upthread, pure multibyte is still faster than PG13. Reducing
the ascii check to 8-bytes at time might alleviate the regression.

I tested this on my first generation Raspberry Pi (chipmunk). I had to
tweak it a bit to make it compile, since the SSE autodetection code was
not finished yet. And I used generate_series(1, 1000) instead of
generate_series(1, 10000) in the test script (mbverifystr-speed.sql)
because this system is so slow.

master:

mixed | ascii
-------+-------
1310 | 1041
(1 row)

v2-add-portability-stub-and-new-fallback.patch:

mixed | ascii
-------+-------
2979 | 910
(1 row)

I'm guessing that's because the unaligned access in check_ascii() is
expensive on this platform.

Hmm, I used memcpy() as suggested. Is that still slow on that platform?
That's 32-bit, right? Some possible remedies:

1) For the COPY FROM case, we should align the allocation on a cacheline --
we already have examples of that idiom elsewhere. I was actually going to
suggest doing this anyway, since unaligned SIMD loads are often slower, too.

2) As the simdjson fallback was based on Fuchsia (the Lemire paper implies
it was tested carefully on Arm and I have no reason to doubt that), I could
try to follow that example more faithfully by computing the actual
codepoints. It's more computation and just as many branches as far as I can
tell, but it's not a lot of work. I can add that alternative fallback to
the patch set. I have no Arm machines, but I can test on a POWER8 machine.

3) #ifdef out the ascii check for 32-bit platforms.

4) Same as the non-UTF8 case -- only check for ascii 8 bytes at a time.
I'll probably try this first.

Now, I'm pleased to report that I got SSE4 working, and it seems to work.
It still needs some stress testing to find any corner case bugs, but it
shouldn't be too early to share some numbers on Clang 10 / MacOS:

master:

chinese | mixed | ascii
---------+-------+-------
1082 | 751 | 364

v3 with SSE4.1:

chinese | mixed | ascii
---------+-------+-------
127 | 128 | 126

Some caveats and notes:

- It takes almost no recognizable code from simdjson, but it does take the
magic constants lookup tables almost verbatim. The main body of the code
has no intrinsics at all (I think). They're all hidden inside static inline
helper functions. I reused some cryptic variable names from simdjson. It's
a bit messy but not terrible.

- It diffs against the noError conversion patch and adds additional tests.

- It's not smart enough to stop at the last valid character boundary --
it's either all-valid or it must start over with the fallback. That will
have to change in order to work with the proposed noError conversions. It
shouldn't be very hard, but needs thought as to the clearest and safest way
to code it.

- There is no ascii fast-path yet. With this algorithm we have to be a bit
more careful since a valid ascii chunk could be preceded by an incomplete
sequence at the end of the previous chunk. Not too hard, just a bit more
work.

- This is my first time hacking autoconf, and it still seems slightly
broken, yet functional on my machine at least.

- It only needs SSE4.1, but I didn't want to create a whole new CFLAGS, so
it just reuses SSE4.2 for the runtime check and the macro names. Also, it
doesn't test for SSE2, it just insists on 64-bit for the runtime check. I
imagine it would refuse to build on 32-bit machines if you passed it -msse42

- There is a placeholder for Windows support, but it's not developed.

- I had to add a large number of casts to get rid of warnings in the magic
constants macros. That needs some polish.

I also attached a C file that visually demonstrates every step of the
algorithm following the example found in Table 9 in the paper. That
contains the skeleton coding I started with and got abandoned early, so it
might differ from the actual patch.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v3-SSE4-with-autoconf-support.patchapplication/octet-stream; name=v3-SSE4-with-autoconf-support.patchDownload+1131-26
test-utf8.capplication/octet-stream; name=test-utf8.cDownload
#14John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#13)
Re: [POC] verifying UTF-8 using SIMD instructions

I wrote:

[v3]
- It's not smart enough to stop at the last valid character boundary --

it's either all-valid or it must start over with the fallback. That will
have to change in order to work with the proposed noError conversions. It
shouldn't be very hard, but needs thought as to the clearest and safest way
to code it.

In v4, it should be able to return an accurate count of valid bytes even
when the end crosses a character boundary.

- This is my first time hacking autoconf, and it still seems slightly

broken, yet functional on my machine at least.

It was actually completely broken if you tried to pass the special flags to
configure. I redesigned this part and it seems to work now.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v4-SSE4-with-autoconf-support.patchapplication/octet-stream; name=v4-SSE4-with-autoconf-support.patchDownload+1077-66
#15John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#13)
Re: [POC] verifying UTF-8 using SIMD instructions

On Mon, Feb 15, 2021 at 9:32 PM John Naylor <john.naylor@enterprisedb.com>
wrote:

On Mon, Feb 15, 2021 at 9:18 AM Heikki Linnakangas <hlinnaka@iki.fi>

wrote:

I'm guessing that's because the unaligned access in check_ascii() is
expensive on this platform.

Some possible remedies:

3) #ifdef out the ascii check for 32-bit platforms.

4) Same as the non-UTF8 case -- only check for ascii 8 bytes at a time.

I'll probably try this first.

I've attached a couple patches to try on top of v4; maybe they'll help the
Arm32 regression. 01 reduces the stride to 8 bytes, and 02 applies on top
of v1 to disable the fallback fast path entirely on 32-bit platforms. A bit
of a heavy hammer, but it'll confirm (or not) your theory about unaligned
loads.

Also, I've included patches to explain more fully how I modeled non-UTF-8
performance while still using the UTF-8 tests. I think it was a useful
thing to do, and I have a theory that might predict how a non-UTF8 encoding
will perform with the fast path.

03A and 03B are independent of each other and conflict, but both apply on
top of v4 (don't need 02). Both replace the v4 fallback with the ascii
fastpath + pg_utf8_verifychar() in the loop, similar to utf-8 on master.
03A has a local static copy of pg_utf8_islegal(), and 03B uses the existing
global function. (On x86, you can disable SSE4 by passing
USE_FALLBACK_UTF8=1 to configure.)

While Clang 10 regressed for me on pure multibyte in a similar test
upthread, on Linux gcc 8.4 there isn't a regression at all. IIRC, gcc
wasn't as good as Clang when the API changed a few weeks ago, so its
regression from v4 is still faster than master. Clang only regressed with
my changes because it somehow handled master much better to begin with.

x86-64 Linux gcc 8.4

master

chinese | mixed | ascii
---------+-------+-------
1453 | 857 | 428

v4 (fallback verifier written as a single function)

chinese | mixed | ascii
---------+-------+-------
815 | 514 | 82

v4 plus addendum 03A -- emulate non-utf-8 using a copy of
pg_utf8_is_legal() as a static function

chinese | mixed | ascii
---------+-------+-------
1115 | 547 | 87

v4 plus addendum 03B -- emulate non-utf-8 using pg_utf8_is_legal() as a
global function

chinese | mixed | ascii
---------+-------+-------
1279 | 604 | 82

(I also tried the same on ppc64le Linux, gcc 4.8.5 and while not great, it
never got worse than master either on pure multibyte.)

This is supposed to model the performance of a non-utf8 encoding, where we
don't have a bespoke function written from scratch. Here's my theory: If an
encoding has pg_*_mblen(), a global function, inside pg_*_verifychar(), it
seems it won't benefit as much from an ascii fast path as one whose
pg_*_verifychar() has no function calls. I'm not sure whether a compiler
can inline a global function's body into call sites in the unit where it's
defined. (I haven't looked at the assembly.) But recall that you didn't
commit 0002 from the earlier encoding change, because it wasn't performing.
I looked at that patch again, and while it inlined the pg_utf8_verifychar()
call, it still called the global function pg_utf8_islegal().

If the above is anything to go by, on gcc at least, I don't think we need
to worry about a regression when adding an ascii fast path to non-utf-8
multibyte encodings.

Regarding SSE, I've added an ascii fast path in my local branch, but it's
not going to be as big a difference because 1) the check is more expensive
in terms of branches than the C case, and 2) because the general case is so
fast already, it's hard to improve upon. I just need to do some testing and
cleanup on the whole thing, and that'll be ready to share.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

addendum-01-8-byte-stride.patchapplication/x-patch; name=addendum-01-8-byte-stride.patchDownload+6-7
addendum-02-remove-ascii-fast-path-32-bit.patchapplication/x-patch; name=addendum-02-remove-ascii-fast-path-32-bit.patchDownload+4-1
addendum-03A-emulate-non-utf8-multibyte-STATIC.patchapplication/x-patch; name=addendum-03A-emulate-non-utf8-multibyte-STATIC.patchDownload+87-74
addendum-03B-emulate-non-utf8-multibyte-GLOBAL.patchapplication/x-patch; name=addendum-03B-emulate-non-utf8-multibyte-GLOBAL.patchDownload+31-74
#16John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#13)
Re: [POC] verifying UTF-8 using SIMD instructions

I made some substantial improvements in v5, and I've taken care of all my
TODOs below. I separated out the non-UTF-8 ascii fast path into a separate
patch, since it's kind of off-topic, and it's not yet clear it's always the
best thing to do.

- It takes almost no recognizable code from simdjson, but it does take

the magic constants lookup tables almost verbatim. The main body of the
code has no intrinsics at all (I think). They're all hidden inside static
inline helper functions. I reused some cryptic variable names from
simdjson. It's a bit messy but not terrible.

In v5, the lookup tables and their comments are cleaned up and modified to
play nice with pgindent.

- It diffs against the noError conversion patch and adds additional tests.

I wanted to get some cfbot testing, so I went ahead and prepended v4 of
Heikki's noError patch so it would apply against master.

- There is no ascii fast-path yet. With this algorithm we have to be a

bit more careful since a valid ascii chunk could be preceded by an
incomplete sequence at the end of the previous chunk. Not too hard, just a
bit more work.

v5 adds an ascii fast path.

- I had to add a large number of casts to get rid of warnings in the

magic constants macros. That needs some polish.

This is much nicer now, only one cast really necessary.

I'm pretty pleased with how it is now, but it could use some thorough
testing for correctness. I'll work on that a bit later.

On my laptop, Clang 10:

master:

chinese | mixed | ascii
---------+-------+-------
1081 | 761 | 366

v5:

chinese | mixed | ascii
---------+-------+-------
136 | 93 | 54

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v4-0001-Add-noError-argument-to-encoding-conversion-funct.patchapplication/octet-stream; name=v4-0001-Add-noError-argument-to-encoding-conversion-funct.patchDownload+2322-629
v5-0002-Use-SSE-4-for-verifying-UTF-8-text.patchapplication/octet-stream; name=v5-0002-Use-SSE-4-for-verifying-UTF-8-text.patchDownload+1084-68
v5-0003-Add-an-ASCII-fast-path-to-non-UTF-8-encoding-veri.patchapplication/octet-stream; name=v5-0003-Add-an-ASCII-fast-path-to-non-UTF-8-encoding-veri.patchDownload+90-1
#17John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#16)
Re: [POC] verifying UTF-8 using SIMD instructions

The cfbot reported a build failure on Windows because of the use of binary
literals. I've turned those into hex for v6, so let's see how far it gets
now.

I also decided to leave out the patch that adds an ascii fast path to
non-UTF-8 encodings. That would really require more testing than I have
time for.

As before, 0001 is v4 of Heikk's noError conversion patch, whose
regressions tests I build upon.

0002 has no ascii fast path in the fallback implementation. 0003 and 0004
add it back in using 8- and 16-byte strides, respectively. That will make
it easier to test on non-Intel platforms, so we can decide which way to go
here. Also did a round of editing the comments in the SSE4.2 file.

I ran the multibyte conversion regression test found in the message below,
and it passed. That doesn't test UTF-8 explicitly, but all conversions
round-trip through UTF-8, so it does get some coverage.

/messages/by-id/b9e3167f-f84b-7aa4-5738-be578a4db924@iki.fi
--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v6-0001-Add-noError-argument-to-encoding-conversion-funct.patchapplication/octet-stream; name=v6-0001-Add-noError-argument-to-encoding-conversion-funct.patchDownload+2322-629
v6-0002-Use-SSE-4-for-verifying-UTF-8-text.patchapplication/octet-stream; name=v6-0002-Use-SSE-4-for-verifying-UTF-8-text.patchDownload+1049-68
v6-0003-Add-an-ASCII-fast-path-to-the-fallback-UTF-8-vali.patchapplication/octet-stream; name=v6-0003-Add-an-ASCII-fast-path-to-the-fallback-UTF-8-vali.patchDownload+43-2
v6-0004-Widen-the-ASCII-fast-path-stride-in-the-fallback-.patchapplication/octet-stream; name=v6-0004-Widen-the-ASCII-fast-path-stride-in-the-fallback-.patchDownload+8-7
#18John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#17)
Re: [POC] verifying UTF-8 using SIMD instructions

v7 fixes an obvious mistake in Solution.pm

--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

v7-0001-Add-noError-argument-to-encoding-conversion-funct.patchapplication/octet-stream; name=v7-0001-Add-noError-argument-to-encoding-conversion-funct.patchDownload+2322-629
v7-0002-Use-SSE-4-for-verifying-UTF-8-text.patchapplication/octet-stream; name=v7-0002-Use-SSE-4-for-verifying-UTF-8-text.patchDownload+1049-68
v7-0003-Add-an-ASCII-fast-path-to-the-fallback-UTF-8-vali.patchapplication/octet-stream; name=v7-0003-Add-an-ASCII-fast-path-to-the-fallback-UTF-8-vali.patchDownload+43-2
v7-0004-Widen-the-ASCII-fast-path-stride-in-the-fallback-.patchapplication/octet-stream; name=v7-0004-Widen-the-ASCII-fast-path-stride-in-the-fallback-.patchDownload+8-7
#19Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: John Naylor (#18)
Re: [POC] verifying UTF-8 using SIMD instructions

Hi,

Just a quick question before I move on to review the patch ... The
improvement looks like it is only meant for x86 platforms. Can this be
done in a portable way by arranging for auto-vectorization ? Something
like commit 88709176236caf. This way it would benefit other platforms
as well.

I tried to compile the following code using -O3, and the assembly does
have vectorized instructions.

#include <stdio.h>
int main()
{
int i;
char s1[200] = "abcdewhruerhetr";
char s2[200] = "oweurietiureuhtrethre";
char s3[200] = {0};

for (i = 0; i < sizeof(s1); i++)
{
s3[i] = s1[i] ^ s2[i];
}

printf("%s\n", s3);
}

#20John Naylor
john.naylor@enterprisedb.com
In reply to: Amit Khandekar (#19)
Re: [POC] verifying UTF-8 using SIMD instructions

On Tue, Mar 9, 2021 at 5:00 AM Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:

Hi,

Just a quick question before I move on to review the patch ... The
improvement looks like it is only meant for x86 platforms.

Actually it's meant to be faster for all platforms, since the C fallback is
quite a bit different from HEAD. I've found it to be faster on ppc64le. An
earlier version of the patch was a loser on 32-bit Arm because of alignment
issues, but if you could run the test script attached to [1]/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi -- John Naylor EDB: http://www.enterprisedb.com on 64-bit Arm,
I'd be curious to see how it does on 0002, and whether 0003 and 0004 make
things better or worse. If there is trouble building on non-x86 platforms,
I'd want to fix that also.

(Note: 0001 is not my patch, and I just include it for the tests)

Can this be
done in a portable way by arranging for auto-vectorization ? Something
like commit 88709176236caf. This way it would benefit other platforms
as well.

I'm fairly certain that the author of a compiler capable of doing that in
this case would be eligible for some kind of AI prize. :-)

[1]: /messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi -- John Naylor EDB: http://www.enterprisedb.com
/messages/by-id/06d45421-61b8-86dd-e765-f1ce527a5a2f@iki.fi
--
John Naylor
EDB: http://www.enterprisedb.com

#21Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: John Naylor (#20)
#22John Naylor
john.naylor@enterprisedb.com
In reply to: Amit Khandekar (#21)
#23John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#22)
#24John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#23)
#25John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#24)
#26Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#26)
#28Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#27)
#29John Naylor
john.naylor@enterprisedb.com
In reply to: Bruce Momjian (#28)
#30John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#29)
#31John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#26)
#32Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#27)
#33John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#32)
#34Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#33)
#35Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#34)
#36John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#35)
#37Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#31)
#38John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#37)
#39Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#38)
#40John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#39)
#41John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#40)
#42John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#41)
#43Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#42)
#44John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#43)
#45John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#44)
#46Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: John Naylor (#45)
#47John Naylor
john.naylor@enterprisedb.com
In reply to: Amit Khandekar (#46)
#48John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#47)
#49Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: John Naylor (#48)
#50John Naylor
john.naylor@enterprisedb.com
In reply to: Vladimir Sitnikov (#49)
#51John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#50)
#52John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#51)
#53John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#50)
#54Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: John Naylor (#51)
#55Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: John Naylor (#53)
#56John Naylor
john.naylor@enterprisedb.com
In reply to: Vladimir Sitnikov (#55)
#57John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#56)
#58Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#22)
#59Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: John Naylor (#57)
#60John Naylor
john.naylor@enterprisedb.com
In reply to: Vladimir Sitnikov (#59)
#61John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#58)
#62Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#61)
#63John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#62)
#64John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#60)
#65Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: John Naylor (#64)
#66John Naylor
john.naylor@enterprisedb.com
In reply to: Vladimir Sitnikov (#65)
#67John Naylor
john.naylor@enterprisedb.com
In reply to: Vladimir Sitnikov (#65)
#68John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#66)
#69John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#66)
#70John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#69)
#71John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#70)
#72John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#71)
#73Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: John Naylor (#72)
#74John Naylor
john.naylor@enterprisedb.com
In reply to: Vladimir Sitnikov (#73)
#75John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#74)
#76Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: John Naylor (#74)
#77Godfrin, Philippe E
Philippe.Godfrin@nov.com
In reply to: Heikki Linnakangas (#76)
#78John Naylor
john.naylor@enterprisedb.com
In reply to: Heikki Linnakangas (#76)
#79John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#78)
#80John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#79)