[patch] _bt_binsrch* improvements - equal-prefix-skip binary search

Started by Matthias van de Meentover 5 years ago3 messageshackers
Jump to latest
#1Matthias van de Meent
boekewurm+postgres@gmail.com

Hi,

I've not yet been involved in postgresql's development process, so here's a
first. Please find attached a patch for improving the BT binary search
speeds, with accompanying performance data.

v1-0001 adapts _bt_compare and _bt_binsrch* to make use of static
properties of L&Y-style nbtree indexes to speed up finding an initial
position in the nbtee.

I alter the logic from _bt_compare to accept a 'start-comparing-at-column'
argument, and to also return which column the comparison result came from.
This allows us to keep track of which columns we've already checked for
equality, and when getting deeper into the index this allows us to skip
checking the already equal key columns.

Specifically, when binary-searching through a page, we keep track of which
column was checked for the high-key, and which for the low-key. The first
column of these that is not yet equal will be used for the next comparison.
Any columns before this column must be equal, as both the high- and the
low-keys in that page have already been validated as having an equal
prefix. The column number is then passed on through the insertion key,
allowing the optimization to be used in the climbing of the nbtree index.

v1-0001 will be especially performant in nbtree indexes with a relatively
low initial cardinality and high compare cost. My own performance testing
was done on a laptop (4c/8t), after first getting it warm with other
benchmarks & compilations, so the results are a bit unstable.

While testing this I also noticed that there were a lot of pipeline stalls
around the arguments and results of _bt_compare in the hot loops of
_bt_binsrch* where the new functionality would be used, so I've moved the
core logic of _bt_compare to a static inline function _bt_compare_inline,
which helped the code to go from a 2% TPS decrease for single-column
indexes to ~ 8% TPS increase for an insert-only workload, and for 3-column
text all-collated indexes a TPS increase of 20% on my system. This also
allowed me to keep the signature of _bt_compare the same as it was,
limiting the scope of the changes to only the named functions.

Finally, this could be a great start on prefix truncation for btree
indexes, though that is _not_ the goal of this patch. This patch skips, but
does not truncate, the common prefix.

Kind regards,

Matthias van de Meent

P.S. One more thing I noticed in benchmarking is that a significant part of
the costs of non-default collations is in looking up the collation twice in
the collation cache. My guess from flame graphs is that there could be a
large throughput increase for short strings if the collation lookup from
lc_collate_is_c() in varstr_cmp could be reused in the subsequent call to
pg_newlocale_from_collation()

Attachments:

performance-tests.txttext/plain; charset=US-ASCII; name=performance-tests.txtDownload
v1-0001-Update-_bt_binsrch-to-account-for-prefix-column-equa.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Update-_bt_binsrch-to-account-for-prefix-column-equa.patchDownload+153-42
In reply to: Matthias van de Meent (#1)
Re: [patch] _bt_binsrch* improvements - equal-prefix-skip binary search

On Fri, Sep 11, 2020 at 7:57 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I've not yet been involved in postgresql's development process, so here's a first. Please find attached a patch for improving the BT binary search speeds, with accompanying performance data.

v1-0001 adapts _bt_compare and _bt_binsrch* to make use of static properties of L&Y-style nbtree indexes to speed up finding an initial position in the nbtee.

Are you familiar with this thread?

/messages/by-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com

I wrote a patch that implemented the same idea, which is sometimes
called dynamic prefix truncation. I found some very subtle problems
with it, though. Concurrent page deletions could occur, which could
cause a scan that skips a prefix to miss that the page it landed on
doesn't have the same common prefix anymore.

there could be a large throughput increase for short strings if the collation lookup from lc_collate_is_c() in varstr_cmp could be reused in the subsequent call to pg_newlocale_from_collation()

I noticed that myself recently.

--
Peter Geoghegan

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#2)
Re: [patch] _bt_binsrch* improvements - equal-prefix-skip binary search

On Fri, 11 Sep 2020 at 19:41, Peter Geoghegan <pg@bowt.ie> wrote:

Are you familiar with this thread?

/messages/by-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com

I wrote a patch that implemented the same idea, which is sometimes
called dynamic prefix truncation. I found some very subtle problems
with it, though. Concurrent page deletions could occur, which could
cause a scan that skips a prefix to miss that the page it landed on
doesn't have the same common prefix anymore.

Thank you for pointing me to that thread, I was not yet familiar with it.
It took me some time to get familiar with the Lanin and Shasha [0]https://archive.org/stream/symmetricconcurr00lani paper,
but the concerns regarding concurrent page deletion are indeed problematic
for algorithmic prefix truncation, and do make it near impossible to use
algorithmic prefix truncation without resetting the accumulated prefix
every page.

At that, I will retract this current patch, and (unless someone's already
working on it) start research on implementing physical prefix truncation
through deduplicating the prefix shared with a page's highkey. This would
likely work fine for inner nodes (there are still flag bits left unused,
and the concerns related to deduplication of equal-but-distinct values do
not apply there), though I'm uncertain about the ability to truncate
duplicate prefixes in leaf nodes as it is basically prefix deduplication
with the same problems attached as 'normal' deduplication.

Thanks,

Matthias van de Meent

[0]: https://archive.org/stream/symmetricconcurr00lani