a couple of small patches for simd.h

Started by Nathan Bossart4 months ago8 messages

nathandbossart@gmail.com

4 months ago

2 attachment(s)

0001 essentially reverts commit c6a43c2, and instead fixes the problem
(MSVC dislikes casts to the same type) by omitting the cast in the
problematic line in pg_lfind32(). While working on optimizing hex_encode()
and hex_decode() [0]https://commitfest.postgresql.org/patch/5538/, I noticed that implicit conversions sufficed.

0002 optimizes vector8_has_le() on AArch64 by using vminvq_u8(). I needed
vector8_has_ge() for the hex_encode()/hex_decode() work and noticed this
opportunity.

[0]: https://commitfest.postgresql.org/patch/5538/

--
nathan

Attachments:

v1-0001-Remove-vector32_is_highbit_set.patchtext/plain; charset=us-asciiDownload

From 699f0b0942c5390a46aa84ada95e5d38a9f21a10 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 22 Sep 2025 16:02:10 -0500
Subject: [PATCH v1 1/2] Remove vector32_is_highbit_set().

---
 src/include/port/pg_lfind.h |  2 +-
 src/include/port/simd.h     | 23 -----------------------
 2 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index 20f7497dcb7..0f460472601 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -139,7 +139,7 @@ pg_lfind32_simd_helper(const Vector32 keys, const uint32 *base)
 	result = vector32_or(tmp1, tmp2);
 
 	/* return whether there was a match */
-	return vector32_is_highbit_set(result);
+	return vector8_is_highbit_set(result);
 }
 #endif							/* ! USE_NO_SIMD */
 
diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index 97c5f353022..6424e8f5a97 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -78,7 +78,6 @@ static inline bool vector8_has_zero(const Vector8 v);
 static inline bool vector8_has_le(const Vector8 v, const uint8 c);
 static inline bool vector8_is_highbit_set(const Vector8 v);
 #ifndef USE_NO_SIMD
-static inline bool vector32_is_highbit_set(const Vector32 v);
 static inline uint32 vector8_highbit_mask(const Vector8 v);
 #endif
 
@@ -279,28 +278,6 @@ vector8_is_highbit_set(const Vector8 v)
 #endif
 }
 
-/*
- * Exactly like vector8_is_highbit_set except for the input type, so it
- * looks at each byte separately.
- *
- * XXX x86 uses the same underlying type for 8-bit, 16-bit, and 32-bit
- * integer elements, but Arm does not, hence the need for a separate
- * function. We could instead adopt the behavior of Arm's vmaxvq_u32(), i.e.
- * check each 32-bit element, but that would require an additional mask
- * operation on x86.
- */
-#ifndef USE_NO_SIMD
-static inline bool
-vector32_is_highbit_set(const Vector32 v)
-{
-#if defined(USE_NEON)
-	return vector8_is_highbit_set((Vector8) v);
-#else
-	return vector8_is_highbit_set(v);
-#endif
-}
-#endif							/* ! USE_NO_SIMD */
-
 /*
  * Return a bitmask formed from the high-bit of each element.
  */
-- 
2.39.5 (Apple Git-154)

v1-0002-Optimize-vector8_has_le-on-AArch64.patchtext/plain; charset=us-asciiDownload

From 43b7c5f60e309c3404771aee2ca9488790db56de Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 22 Sep 2025 16:17:09 -0500
Subject: [PATCH v1 2/2] Optimize vector8_has_le() on AArch64.

---
 src/include/port/simd.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index 6424e8f5a97..a594f1c1de3 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -249,6 +249,8 @@ vector8_has_le(const Vector8 v, const uint8 c)
 			}
 		}
 	}
+#elif defined(USE_NEON)
+	result = vminvq_u8(v) <= c;
 #else
 
 	/*
-- 
2.39.5 (Apple Git-154)

John Naylor

johncnaylorls@gmail.com

4 months ago

In reply to: Nathan Bossart (#1)

Re: a couple of small patches for simd.h

On Tue, Sep 23, 2025 at 4:44 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

0001 essentially reverts commit c6a43c2, and instead fixes the problem
(MSVC dislikes casts to the same type) by omitting the cast in the
problematic line in pg_lfind32(). While working on optimizing hex_encode()
and hex_decode() [0], I noticed that implicit conversions sufficed.

I don't remember why that cast was there, but I suggest testing on gcc
/ aarch64 if you haven't already.

As for 0002, +1 for more succinct expressions when the hardware supports it.

--
John Naylor
Amazon Web Services

Nathan Bossart

nathandbossart@gmail.com

4 months ago

In reply to: John Naylor (#2)

Re: a couple of small patches for simd.h

On Wed, Sep 24, 2025 at 08:04:59AM +0700, John Naylor wrote:

On Tue, Sep 23, 2025 at 4:44 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

0001 essentially reverts commit c6a43c2, and instead fixes the problem
(MSVC dislikes casts to the same type) by omitting the cast in the
problematic line in pg_lfind32(). While working on optimizing hex_encode()
and hex_decode() [0], I noticed that implicit conversions sufficed.

I don't remember why that cast was there, but I suggest testing on gcc
/ aarch64 if you haven't already.

Ah, I thought CI was testing that, but apparently it is not. After some
basic testing in godbolt.org, I see that leaving out the cast makes
gcc/arm64 unhappy, and adding the cast makes msvc/x64 unhappy. gcc has a
-flax-vector-conversions option that fixes it, but the documentation for
that option [0]https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-flax-vector-conversions cautions against using it. So, 0001 is bogus, and I need
to figure out how to get the hex_{encode,decode} patches working for
gcc/arm64...

[0]: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-flax-vector-conversions

--
nathan

Nathan Bossart

nathandbossart@gmail.com

4 months ago

In reply to: Nathan Bossart (#3)

1 attachment(s)

Re: a couple of small patches for simd.h

Here's a new version of 0002 with a modified SSE2 implementation, as
discussed elsewhere [0]/messages/by-id/aNWO7L43UevRErw_@nathan. This allows us to remove vector8_ssub().

[0]: /messages/by-id/aNWO7L43UevRErw_@nathan

--
nathan

Attachments:

v2-0001-Optimize-vector8_has_le-on-AArch64.patchtext/plain; charset=us-asciiDownload

From a644f049add68ea78326f88d8994898c92f23c20 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 22 Sep 2025 16:17:09 -0500
Subject: [PATCH v2 1/1] Optimize vector8_has_le() on AArch64.

---
 src/include/port/simd.h | 32 ++++++--------------------------
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index 97c5f353022..e317e5fdfc0 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -86,7 +86,6 @@ static inline uint32 vector8_highbit_mask(const Vector8 v);
 static inline Vector8 vector8_or(const Vector8 v1, const Vector8 v2);
 #ifndef USE_NO_SIMD
 static inline Vector32 vector32_or(const Vector32 v1, const Vector32 v2);
-static inline Vector8 vector8_ssub(const Vector8 v1, const Vector8 v2);
 #endif
 
 /*
@@ -250,14 +249,13 @@ vector8_has_le(const Vector8 v, const uint8 c)
 			}
 		}
 	}
-#else
+#elif defined(USE_SSE2)
+	Vector8		umin = _mm_min_epu8(v, vector8_broadcast(c));
+	Vector8		cmpe = _mm_cmpeq_epi8(umin, v);
 
-	/*
-	 * Use saturating subtraction to find bytes <= c, which will present as
-	 * NUL bytes.  This approach is a workaround for the lack of unsigned
-	 * comparison instructions on some architectures.
-	 */
-	result = vector8_has_zero(vector8_ssub(v, vector8_broadcast(c)));
+	result = vector8_is_highbit_set(cmpe);
+#elif defined(USE_NEON)
+	result = vminvq_u8(v) <= c;
 #endif
 
 	Assert(assert_result == result);
@@ -358,24 +356,6 @@ vector32_or(const Vector32 v1, const Vector32 v2)
 }
 #endif							/* ! USE_NO_SIMD */
 
-/*
- * Return the result of subtracting the respective elements of the input
- * vectors using saturation (i.e., if the operation would yield a value less
- * than zero, zero is returned instead).  For more information on saturation
- * arithmetic, see https://en.wikipedia.org/wiki/Saturation_arithmetic
- */
-#ifndef USE_NO_SIMD
-static inline Vector8
-vector8_ssub(const Vector8 v1, const Vector8 v2)
-{
-#ifdef USE_SSE2
-	return _mm_subs_epu8(v1, v2);
-#elif defined(USE_NEON)
-	return vqsubq_u8(v1, v2);
-#endif
-}
-#endif							/* ! USE_NO_SIMD */
-
 /*
  * Return a vector with all bits set in each lane where the corresponding
  * lanes in the inputs are equal.
-- 
2.39.5 (Apple Git-154)

Nathan Bossart

nathandbossart@gmail.com

4 months ago

In reply to: Nathan Bossart (#4)

1 attachment(s)

Re: a couple of small patches for simd.h

On Thu, Sep 25, 2025 at 02:10:19PM -0500, Nathan Bossart wrote:

Here's a new version of 0002 with a modified SSE2 implementation, as
discussed elsewhere [0]. This allows us to remove vector8_ssub().

Sorry for the noise. v3 fixes the mixed-declarations-and-code problems.

--
nathan

Attachments:

v3-0001-Optimize-vector8_has_le-on-AArch64.patchtext/plain; charset=us-asciiDownload

From 2d355136c9b6aede2c1cb6f8f08fa7eacd032b61 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 22 Sep 2025 16:17:09 -0500
Subject: [PATCH v3 1/1] Optimize vector8_has_le() on AArch64.

---
 src/include/port/simd.h | 37 ++++++++++---------------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index 97c5f353022..79dd1f76962 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -86,7 +86,6 @@ static inline uint32 vector8_highbit_mask(const Vector8 v);
 static inline Vector8 vector8_or(const Vector8 v1, const Vector8 v2);
 #ifndef USE_NO_SIMD
 static inline Vector32 vector32_or(const Vector32 v1, const Vector32 v2);
-static inline Vector8 vector8_ssub(const Vector8 v1, const Vector8 v2);
 #endif
 
 /*
@@ -213,6 +212,10 @@ static inline bool
 vector8_has_le(const Vector8 v, const uint8 c)
 {
 	bool		result = false;
+#ifdef USE_SSE2
+	Vector8		umin;
+	Vector8		cmpe;
+#endif
 
 	/* pre-compute the result for assert checking */
 #ifdef USE_ASSERT_CHECKING
@@ -250,14 +253,12 @@ vector8_has_le(const Vector8 v, const uint8 c)
 			}
 		}
 	}
-#else
-
-	/*
-	 * Use saturating subtraction to find bytes <= c, which will present as
-	 * NUL bytes.  This approach is a workaround for the lack of unsigned
-	 * comparison instructions on some architectures.
-	 */
-	result = vector8_has_zero(vector8_ssub(v, vector8_broadcast(c)));
+#elif defined(USE_SSE2)
+	umin = _mm_min_epu8(v, vector8_broadcast(c));
+	cmpe = _mm_cmpeq_epi8(umin, v);
+	result = vector8_is_highbit_set(cmpe);
+#elif defined(USE_NEON)
+	result = vminvq_u8(v) <= c;
 #endif
 
 	Assert(assert_result == result);
@@ -358,24 +359,6 @@ vector32_or(const Vector32 v1, const Vector32 v2)
 }
 #endif							/* ! USE_NO_SIMD */
 
-/*
- * Return the result of subtracting the respective elements of the input
- * vectors using saturation (i.e., if the operation would yield a value less
- * than zero, zero is returned instead).  For more information on saturation
- * arithmetic, see https://en.wikipedia.org/wiki/Saturation_arithmetic
- */
-#ifndef USE_NO_SIMD
-static inline Vector8
-vector8_ssub(const Vector8 v1, const Vector8 v2)
-{
-#ifdef USE_SSE2
-	return _mm_subs_epu8(v1, v2);
-#elif defined(USE_NEON)
-	return vqsubq_u8(v1, v2);
-#endif
-}
-#endif							/* ! USE_NO_SIMD */
-
 /*
  * Return a vector with all bits set in each lane where the corresponding
  * lanes in the inputs are equal.
-- 
2.39.5 (Apple Git-154)

Nathan Bossart

nathandbossart@gmail.com

3 months ago

In reply to: Nathan Bossart (#5)

Re: a couple of small patches for simd.h

On Thu, Sep 25, 2025 at 04:27:35PM -0500, Nathan Bossart wrote:

On Thu, Sep 25, 2025 at 02:10:19PM -0500, Nathan Bossart wrote:

Here's a new version of 0002 with a modified SSE2 implementation, as
discussed elsewhere [0]. This allows us to remove vector8_ssub().

Sorry for the noise. v3 fixes the mixed-declarations-and-code problems.

Barring objections, I plan to commit this soon. Probably tomorrow.

--
nathan

John Naylor

johncnaylorls@gmail.com

3 months ago

In reply to: Nathan Bossart (#6)

Re: a couple of small patches for simd.h

On Fri, Oct 3, 2025 at 12:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

Barring objections, I plan to commit this soon. Probably tomorrow.

LGTM.

--
John Naylor
Amazon Web Services

Nathan Bossart

nathandbossart@gmail.com

3 months ago

In reply to: John Naylor (#7)

Re: a couple of small patches for simd.h

On Fri, Oct 03, 2025 at 02:40:22PM +0700, John Naylor wrote:

LGTM.

Committed, thanks for reviewing.

--
nathan