use SIMD in GetPrivateRefCountEntry()

Started by Nathan Bossart6 months ago6 messageshackers
Jump to latest
#1Nathan Bossart
nathandbossart@gmail.com

(new thread)

On Wed, Sep 03, 2025 at 02:47:25PM -0400, Andres Freund wrote:

I see a variety for increased CPU usage:

1) The private ref count infrastructure in bufmgr.c gets a bit slower once
more buffers are pinned

The problem mainly seems to be that the branches in the loop at the start of
GetPrivateRefCountEntry() are entirely unpredictable in this workload. I had
an old patch that tried to make it possible to use SIMD for the search, by
using a separate array for the Buffer ids - with that gcc generates fairly
crappy code, but does make the code branchless.

Here that substantially reduces the overhead of doing prefetching. Afterwards
it's not a meaningful source of misses anymore.

I quickly hacked together some patches for this. 0001 adds new static
variables so that we have a separate array of the buffers and the index for
the current ReservedRefCountEntry. 0002 optimizes the linear search in
GetPrivateRefCountEntry() using our simd.h routines. This stuff feels
expensive (see vector8_highbit_mask()'s implementation for AArch64), but if
the main goal is to avoid branches, I think this is about as "branchless"
as we can make it. I'm going to stare at this a bit longer, but I figured
I'd get something on the lists while it is fresh in my mind.

--
nathan

Attachments:

v1-0001-prepare-bufmgr-for-simd.patchtext/plain; charset=us-asciiDownload+18-18
v1-0002-simd-ify-GetPrivateRefCountEntry.patchtext/plain; charset=us-asciiDownload+70-1
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#1)
Re: use SIMD in GetPrivateRefCountEntry()

Sorry for the noise. I fixed x86-64 builds in v2.

--
nathan

Attachments:

v2-0001-prepare-bufmgr-for-simd.patchtext/plain; charset=us-asciiDownload+18-18
v2-0002-simd-ify-GetPrivateRefCountEntry.patchtext/plain; charset=us-asciiDownload+70-1
#3Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Nathan Bossart (#2)
Re: use SIMD in GetPrivateRefCountEntry()

03.10.2025 23:51, Nathan Bossart пишет:

Sorry for the noise. I fixed x86-64 builds in v2.

Why not just use simplehash for private ref counts?
Without separation on array and overflow parts.
Just single damn simple hash table.

--
regards
Yura Sokolov aka funny-falcon

#4Andres Freund
andres@anarazel.de
In reply to: Yura Sokolov (#3)
Re: use SIMD in GetPrivateRefCountEntry()

Hi,

On October 24, 2025 3:43:34 PM GMT+03:00, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

03.10.2025 23:51, Nathan Bossart пишет:

Sorry for the noise. I fixed x86-64 builds in v2.

Why not just use simplehash for private ref counts?
Without separation on array and overflow parts.
Just single damn simple hash table.

It's to expensive for common access patterns in my benchmarks. Buffer accesses are very very very common and hash tables have no spatial locality.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In reply to: Nathan Bossart (#1)
Re: use SIMD in GetPrivateRefCountEntry()

On Fri, Oct 3, 2025 at 10:48 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

I quickly hacked together some patches for this. 0001 adds new static
variables so that we have a separate array of the buffers and the index for
the current ReservedRefCountEntry. 0002 optimizes the linear search in
GetPrivateRefCountEntry() using our simd.h routines. This stuff feels
expensive (see vector8_highbit_mask()'s implementation for AArch64), but if
the main goal is to avoid branches, I think this is about as "branchless"
as we can make it. I'm going to stare at this a bit longer, but I figured
I'd get something on the lists while it is fresh in my mind.

I was unable to notice any improvements in any of the microbenchmarks
that I've been using to test the index prefetching patch set. For
whatever reason, these test cases are neither improved nor regressed
by your patch series.

I've never really played around with SIMD before. Is the precise CPU
microarchitecture relevant? Are power management settings important?

--
Peter Geoghegan

In reply to: Peter Geoghegan (#5)
Re: use SIMD in GetPrivateRefCountEntry()

On Fri, Oct 24, 2025 at 4:32 PM Peter Geoghegan <pg@bowt.ie> wrote:

I was unable to notice any improvements in any of the microbenchmarks
that I've been using to test the index prefetching patch set. For
whatever reason, these test cases are neither improved nor regressed
by your patch series.

Correction: appears to be a regression at higher client counts with
standard pgbench SELECT + the index prefetching patchset + your v2
patchset. Not a massive one (about a 5% loss in TPS/throughput), and
not one that I can reproduce at lower client counts.

There are 16 physical cores on this machine, and that seems to be
around the cutoff for getting these regressions. I've disabled
turboboost and typerthreading on this machine, since I find that that
leads to more consistent performance, at least at lower client counts.

--
Peter Geoghegan