Improving btree performance through specializing by key shape, take 2
Hi,
Here's a new patch, well-timed for the next feature freeze. [0]The one for PG16, that is.
Last year I submitted a patchset that updated the way nbtree keys are
compared [1]/messages/by-id/CAEze2Whwvr8aYcBf0BeBuPy8mJGtwxGvQYA9OGR5eLFh6Q_ZvA@mail.gmail.com; by moving from index_getattr to an iterator-based
approach. That patch did perform quite well for multi-column indexes,
but significantly reduced performance for keys that benefit from
Form_pg_attribute->attcacheoff.
Here's generation 2 of this effort. Instead of proceeding to trying to
shoehorn all types of btree keys into one common code path, this
patchset acknowledges that there exist different shapes of keys that
each have a different best way of accessing each subsequent key
attribute. This patch achieves this by specializing the functions to
different key shapes.
The approach is comparable to JIT, but significantly different in that
it optimizes a set of statically defined shapes, and not any arbitrary
shape through a runtime compilation step. Jit could be implemented,
but not easily with the APIs available to IndexAMs: the amhandler for
indexes does not receive any information what the shape of the index
is going to be; so
0001: Moves code that can benefit from key attribute accessor
specialization out of their current files and into specializable
files.
The functions selected for specialization are either not much more
than wrappers for specializable functions, or functions that have
(hot) loops around specializable code; where specializable means
accessing multiple IndexTuple attributes directly.
0002: Updates the specializable code to use the specialized attribute
iteration macros
0003: Optimizes access to the key column when there's only one key column.
0004: Optimizes access to the key columns when we cannot use
attcacheoff for the key columns
0005: Creates a helper function to populate all attcacheoff fields
with their correct values; populating them with -2 whenever a cache
offset is impossible to determine, as opposed to the default -1
(unknown); allowing functions to determine the cachability of the n-th
attribute in O(1).
0006: Creates a specialization macro that replaces rd_indam->aminsert
with its optimized variant, for improved index tuple insertion
performance.
These patches still have some rough edges (specifically: some
functions that are being generated are unused, and intermediate
patches don't compile), but I wanted to get this out to get some
feedback on this approach.
I expect the performance to be at least on par with current btree
code, and I'll try to publish a more polished patchset with
performance results sometime in the near future. I'll also try to
re-attach dynamic page-level prefix truncation, but that depends on
the amount of time I have and the amount of feedback on this patchset.
-Matthias
[0]: The one for PG16, that is.
[1]: /messages/by-id/CAEze2Whwvr8aYcBf0BeBuPy8mJGtwxGvQYA9OGR5eLFh6Q_ZvA@mail.gmail.com
Attachments:
v1-0004-Implement-specialized-uncacheable-attribute-itera.patchapplication/octet-stream; name=v1-0004-Implement-specialized-uncacheable-attribute-itera.patchDownload+278-3
v1-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchapplication/octet-stream; name=v1-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchDownload+99-1
v1-0002-Use-specialized-attribute-iterators-in-nbt-_spec..patchapplication/octet-stream; name=v1-0002-Use-specialized-attribute-iterators-in-nbt-_spec..patchDownload+60-41
v1-0003-Optimize-attribute-iterator-access-for-single-col.patchapplication/octet-stream; name=v1-0003-Optimize-attribute-iterator-access-for-single-col.patchDownload+19-2
v1-0001-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/octet-stream; name=v1-0001-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3369-2916
v1-0006-Specialize-the-nbtree-rd_indam-entry.patchapplication/octet-stream; name=v1-0006-Specialize-the-nbtree-rd_indam-entry.patchDownload+33-6
On Fri, Apr 8, 2022 at 9:55 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
Here's generation 2 of this effort. Instead of proceeding to trying to
shoehorn all types of btree keys into one common code path, this
patchset acknowledges that there exist different shapes of keys that
each have a different best way of accessing each subsequent key
attribute. This patch achieves this by specializing the functions to
different key shapes.
Cool.
These patches still have some rough edges (specifically: some
functions that are being generated are unused, and intermediate
patches don't compile), but I wanted to get this out to get some
feedback on this approach.
I attempted to apply your patch series to get some general sense of
how it affects performance, by using my own test cases from previous
nbtree project work. I gave up on that pretty quickly, though, since
the code wouldn't compile. That in itself might have been okay (some
"rough edges" are generally okay). The real problem was that it wasn't
clear what I was expected to do about it! You mentioned that some of
the patches just didn't compile, but which ones? How do I quickly get
some idea of the benefits on offer here, however imperfect or
preliminary?
Can you post a version of this that compiles? As a general rule you
should try to post patches that can be "test driven" easily. An
opening paragraph that says "here is why you should care about my
patch" is often something to strive for, too. I suspect that you
actually could have done that here, but you didn't, for whatever
reason.
I expect the performance to be at least on par with current btree
code, and I'll try to publish a more polished patchset with
performance results sometime in the near future. I'll also try to
re-attach dynamic page-level prefix truncation, but that depends on
the amount of time I have and the amount of feedback on this patchset.
Can you give a few motivating examples? You know, queries that are
sped up by the patch series, with an explanation of where the benefit
comes from. You had some on the original thread, but that included
dynamic prefix truncation stuff as well.
Ideally you would also describe where the adversized improvements come
from for each test case -- which patch, which enhancement (perhaps
only in rough terms for now).
--
Peter Geoghegan
On Sun, Apr 10, 2022 at 2:44 PM Peter Geoghegan <pg@bowt.ie> wrote:
Can you post a version of this that compiles?
I forgot to add: the patch also bitrot due to recent commit dbafe127.
I didn't get stuck at this point (this is minor bitrot), but no reason
not to rebase.
--
Peter Geoghegan
On Sun, 10 Apr 2022 at 23:45, Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Apr 8, 2022 at 9:55 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:Here's generation 2 of this effort. Instead of proceeding to trying to
shoehorn all types of btree keys into one common code path, this
patchset acknowledges that there exist different shapes of keys that
each have a different best way of accessing each subsequent key
attribute. This patch achieves this by specializing the functions to
different key shapes.Cool.
These patches still have some rough edges (specifically: some
functions that are being generated are unused, and intermediate
patches don't compile), but I wanted to get this out to get some
feedback on this approach.I attempted to apply your patch series to get some general sense of
how it affects performance, by using my own test cases from previous
nbtree project work. I gave up on that pretty quickly, though, since
the code wouldn't compile. That in itself might have been okay (some
"rough edges" are generally okay). The real problem was that it wasn't
clear what I was expected to do about it! You mentioned that some of
the patches just didn't compile, but which ones? How do I quickly get
some idea of the benefits on offer here, however imperfect or
preliminary?Can you post a version of this that compiles? As a general rule you
should try to post patches that can be "test driven" easily. An
opening paragraph that says "here is why you should care about my
patch" is often something to strive for, too. I suspect that you
actually could have done that here, but you didn't, for whatever
reason.
Yes, my bad. I pulled one patch that included unrelated changes from
the set; but I missed that it also contained some changes that
should've been in an earlier commit, thus breaking the set.
I'll send an updated patchset soon (I'm planning on moving around when
what is changed/added); but before that the attached incremental patch
should help. FYI, the patchset has been tested on commit 05023a23, and
compiles (with unused function warnings) after applying the attached
patch.
I expect the performance to be at least on par with current btree
code, and I'll try to publish a more polished patchset with
performance results sometime in the near future. I'll also try to
re-attach dynamic page-level prefix truncation, but that depends on
the amount of time I have and the amount of feedback on this patchset.Can you give a few motivating examples? You know, queries that are
sped up by the patch series, with an explanation of where the benefit
comes from. You had some on the original thread, but that included
dynamic prefix truncation stuff as well.
Queries that I expect to be faster are situations where the index does
front-to-back attribute accesses in a tight loop and repeated index
lookups; such as in index builds, data loads, JOINs, or IN () and =
ANY () operations; and then specifically for indexes with only a
single key attribute, or indexes where we can determine based on the
index attributes' types that nocache_index_getattr will be called at
least once for a full _bt_compare call (i.e. att->attcacheoff cannot
be set for at least one key attribute).
Queries that I expect to be slower to a limited degree are hot loops
on btree indexes that do not have a specifically optimized path, as
there is some extra overhead calling the specialized functions. Other
code might also see a minimal performance impact due to an increased
binary size resulting in more cache thrashing.
Ideally you would also describe where the adversized improvements come
from for each test case -- which patch, which enhancement (perhaps
only in rough terms for now).
In the previous iteration, I discerned that there are different
"shapes" of indexes, some of which currently have significant overhead
in the existing btree infrastructure. Especially indexes with multiple
key attributes can see significant overhead while their attributes are
being extracted, which (for a significant part) can be attributed to
the O(n) behaviour of nocache_index_getattr. This O(n) overhead is
currently avoided only by indexes with only a single key attribute and
by indexes in which all key attributes have a fixed size (att->attlen
0).
The types of btree keys I discerned were:
CREATE INDEX ON tst ...
... (single_key_attribute)
... (varlen, other, attributes, ...)
... (fixed_size, also_fixed, ...)
... (sometimes_null, other, attributes, ...)
For single-key-attribute btrees, the performance benefits in the patch
are achieved by reducing branching in the attribute extraction: There
are no other key attributes to worry about, so much of the code
dealing with looping over attributes can inline values, and thus
reduce the amount of code generated in the hot path.
For btrees with multiple key attributes, benefits are achieved if some
attributes are of variable length (e.g. text):
On master, if your index looks like CREATE INDEX ON tst (varlen,
fixed, fixed), for the latter attributes the code will always hit the
slow path of nocache_index_getattr. This introduces a significant
overhead; as that function wants to re-establish that the requested
attribute's offset is indeed not cached and not cacheable, and
calculates the requested attributes' offset in the tuple from
effectively zero. That is usually quite wasteful, as (in btree code,
usually) we'd already calculated the previous attribute's offset just
a moment ago; which should be reusable.
In this patch, the code will use an attribute iterator (as described
and demonstrated in the linked thread) to remove this O(n)
per-attribute overhead and change the worst-case complexity of
iterating over the attributes of such an index tuple from O(n^2) to
O(n).
Null attributes in the key are not yet handled in any special manner
in this patch. That is mainly because it is impossible to statically
determine which attribute is going to be null based on the index
definition alone, and thus doesn't benefit as much from statically
generated optimized paths.
-Mathias
Attachments:
v1-0007-Add_missed_declarations_for__bt_keep_natts.patchapplication/octet-stream; name=v1-0007-Add_missed_declarations_for__bt_keep_natts.patchDownload+2-0
On Sun, Apr 10, 2022 at 4:08 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
I'll send an updated patchset soon (I'm planning on moving around when
what is changed/added); but before that the attached incremental patch
should help. FYI, the patchset has been tested on commit 05023a23, and
compiles (with unused function warnings) after applying the attached
patch.
I can get it to work now, with your supplemental patch.
Queries that I expect to be faster are situations where the index does
front-to-back attribute accesses in a tight loop and repeated index
lookups; such as in index builds, data loads, JOINs, or IN () and =
ANY () operations; and then specifically for indexes with only a
single key attribute, or indexes where we can determine based on the
index attributes' types that nocache_index_getattr will be called at
least once for a full _bt_compare call (i.e. att->attcacheoff cannot
be set for at least one key attribute).
I did some quick testing of the patch series -- pretty much just
reusing my old test suite from the Postgres 12 and 13 nbtree work.
This showed that there is a consistent improvement in some cases. It
also failed to demonstrate any performance regressions. That's
definitely a good start.
I saw about a 4% reduction in runtime for the same UK land registry
test that you yourself have run in the past for the same patch series
[1]: /messages/by-id/CAEze2Whwvr8aYcBf0BeBuPy8mJGtwxGvQYA9OGR5eLFh6Q_ZvA@mail.gmail.com -- Peter Geoghegan
of speed up with this test case, except perhaps by further compressing
the on-disk representation used by nbtree. My guess is that the patch
reduces the runtime for this particular test case to a level that's
significantly closer to the limit for this particular piece of
silicon. Which is not to be sniffed at.
Admittedly these test cases were chosen purely because they were
convenient. They were originally designed to test space utilization,
which isn't affected either way here. I like writing reproducible test
cases for indexing stuff, and think that it could work well here too
(even though you're not optimizing space utilization at all). A real
test suite that targets a deliberately chosen cross section of "index
shapes" might work very well.
In the previous iteration, I discerned that there are different
"shapes" of indexes, some of which currently have significant overhead
in the existing btree infrastructure. Especially indexes with multiple
key attributes can see significant overhead while their attributes are
being extracted, which (for a significant part) can be attributed to
the O(n) behaviour of nocache_index_getattr. This O(n) overhead is
currently avoided only by indexes with only a single key attribute and
by indexes in which all key attributes have a fixed size (att->attlen0).
Good summary.
The types of btree keys I discerned were:
CREATE INDEX ON tst ...
... (single_key_attribute)
... (varlen, other, attributes, ...)
... (fixed_size, also_fixed, ...)
... (sometimes_null, other, attributes, ...)For single-key-attribute btrees, the performance benefits in the patch
are achieved by reducing branching in the attribute extraction: There
are no other key attributes to worry about, so much of the code
dealing with looping over attributes can inline values, and thus
reduce the amount of code generated in the hot path.
I agree that it might well be useful to bucket indexes into several
different "index shape archetypes" like this. Roughly the same
approach worked well for me in the past. This scheme might turn out to
be reductive, but even then it could still be very useful (all models
are wrong, some are useful, now as ever).
For btrees with multiple key attributes, benefits are achieved if some
attributes are of variable length (e.g. text):
On master, if your index looks like CREATE INDEX ON tst (varlen,
fixed, fixed), for the latter attributes the code will always hit the
slow path of nocache_index_getattr. This introduces a significant
overhead; as that function wants to re-establish that the requested
attribute's offset is indeed not cached and not cacheable, and
calculates the requested attributes' offset in the tuple from
effectively zero.
Right. So this particular index shape seems like something that we
treat in a rather naive way currently.
Can you demonstrate that with a custom test case? (The result I cited
before was from a '(varlen,varlen,varlen)' index, which is important,
but less relevant.)
[1]: /messages/by-id/CAEze2Whwvr8aYcBf0BeBuPy8mJGtwxGvQYA9OGR5eLFh6Q_ZvA@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
On Mon, 11 Apr 2022 at 03:11, Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, Apr 10, 2022 at 4:08 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:I'll send an updated patchset soon (I'm planning on moving around when
what is changed/added); but before that the attached incremental patch
should help. FYI, the patchset has been tested on commit 05023a23, and
compiles (with unused function warnings) after applying the attached
patch.I can get it to work now, with your supplemental patch.
Great. Attached is the updated patchset, with as main changes:
- Rebased on top of 5bb2b6ab
- All patches should compile when built on top of each preceding patch.
- Reordered the patches to a more logical order, and cleaned up the
content of each patch
- Updated code so that GCC doesn't warn about unused code.
- Add a patch for dynamic prefix truncation at page level; ref thread at [1]/messages/by-id/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg@mail.gmail.com.
- Fixed issues in the specialization macros that resulted in issues
with DPT above.
Still to-do:
- Validate performance and share the numbers for the same test indexes
in [1]/messages/by-id/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg@mail.gmail.com. I'm planning on doing that next Monday.
- Decide whether / how to keep the NBTS_ENABLED flag. The current
#define in nbtree.h is a bad example of a compile-time configuration,
that should be changed (even if we only want to be able to disable
specialization at compile-time, it should be moved).
Maybe:
- More tests: PG already extensively tests the btree code while it is
running the test suite - btree is the main index AM - but more tests
might be needed to test the validity of the specialized code.
Queries that I expect to be faster are situations where the index does
front-to-back attribute accesses in a tight loop and repeated index
lookups; such as in index builds, data loads, JOINs, or IN () and =
ANY () operations; and then specifically for indexes with only a
single key attribute, or indexes where we can determine based on the
index attributes' types that nocache_index_getattr will be called at
least once for a full _bt_compare call (i.e. att->attcacheoff cannot
be set for at least one key attribute).I did some quick testing of the patch series -- pretty much just
reusing my old test suite from the Postgres 12 and 13 nbtree work.
This showed that there is a consistent improvement in some cases. It
also failed to demonstrate any performance regressions. That's
definitely a good start.I saw about a 4% reduction in runtime for the same UK land registry
test that you yourself have run in the past for the same patch series
[1].
That's good to know. The updated patches (as attached) have dynamic
prefix truncation from the patch series in [1]/messages/by-id/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg@mail.gmail.com added too, which should
improve the performance by a few more percentage points in that
specific test case.
I suspect that there just aren't that many ways to get that kind
of speed up with this test case, except perhaps by further compressing
the on-disk representation used by nbtree. My guess is that the patch
reduces the runtime for this particular test case to a level that's
significantly closer to the limit for this particular piece of
silicon. Which is not to be sniffed at.Admittedly these test cases were chosen purely because they were
convenient. They were originally designed to test space utilization,
which isn't affected either way here. I like writing reproducible test
cases for indexing stuff, and think that it could work well here too
(even though you're not optimizing space utilization at all). A real
test suite that targets a deliberately chosen cross section of "index
shapes" might work very well.
I'm not sure what you're refering to. Is the set of indexes I used in
[1]: /messages/by-id/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg@mail.gmail.com
In the previous iteration, I discerned that there are different
"shapes" of indexes, some of which currently have significant overhead
in the existing btree infrastructure. Especially indexes with multiple
key attributes can see significant overhead while their attributes are
being extracted, which (for a significant part) can be attributed to
the O(n) behaviour of nocache_index_getattr. This O(n) overhead is
currently avoided only by indexes with only a single key attribute and
by indexes in which all key attributes have a fixed size (att->attlen0).
Good summary.
The types of btree keys I discerned were:
CREATE INDEX ON tst ...
... (single_key_attribute)
... (varlen, other, attributes, ...)
... (fixed_size, also_fixed, ...)
... (sometimes_null, other, attributes, ...)For single-key-attribute btrees, the performance benefits in the patch
are achieved by reducing branching in the attribute extraction: There
are no other key attributes to worry about, so much of the code
dealing with looping over attributes can inline values, and thus
reduce the amount of code generated in the hot path.I agree that it might well be useful to bucket indexes into several
different "index shape archetypes" like this. Roughly the same
approach worked well for me in the past. This scheme might turn out to
be reductive, but even then it could still be very useful (all models
are wrong, some are useful, now as ever).For btrees with multiple key attributes, benefits are achieved if some
attributes are of variable length (e.g. text):
On master, if your index looks like CREATE INDEX ON tst (varlen,
fixed, fixed), for the latter attributes the code will always hit the
slow path of nocache_index_getattr. This introduces a significant
overhead; as that function wants to re-establish that the requested
attribute's offset is indeed not cached and not cacheable, and
calculates the requested attributes' offset in the tuple from
effectively zero.Right. So this particular index shape seems like something that we
treat in a rather naive way currently.
But really every index shape is treated naively, except the cacheable
index shapes. The main reason we haven't cared about it much is that
you don't often see btrees with many key attributes, and when it's
slow that is explained away 'because it is a big index and a wide
index key' but still saves orders of magnitude over a table scan, so
people generally don't complain about it. A notable exception was
80b9e9c4, where a customer complained about index scans being faster
than index only scans.
Can you demonstrate that with a custom test case? (The result I cited
before was from a '(varlen,varlen,varlen)' index, which is important,
but less relevant.)
Anything that has a variable length in any attribute other than the
last; so that includes (varlen, int) and also (int, int, varlen, int,
int, int, int).
The catalogs currently seem to include only one such index:
pg_proc_proname_args_nsp_index is an index on pg_proc (name (const),
oidvector (varlen), oid (const)).
- Matthias
[1]: /messages/by-id/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg@mail.gmail.com
Attachments:
v2-0002-Use-specialized-attribute-iterators-in-backend-nb.patchapplication/x-patch; name=v2-0002-Use-specialized-attribute-iterators-in-backend-nb.patchDownload+62-42
v2-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchapplication/x-patch; name=v2-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchDownload+99-1
v2-0003-Specialize-the-nbtree-rd_indam-entry.patchapplication/x-patch; name=v2-0003-Specialize-the-nbtree-rd_indam-entry.patchDownload+25-1
v2-0004-Optimize-attribute-iterator-access-for-single-col.patchapplication/x-patch; name=v2-0004-Optimize-attribute-iterator-access-for-single-col.patchDownload+64-2
v2-0001-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/x-patch; name=v2-0001-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3357-2916
v2-0006-Implement-specialized-uncacheable-attribute-itera.patchapplication/x-patch; name=v2-0006-Implement-specialized-uncacheable-attribute-itera.patchDownload+223-3
v2-0007-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/x-patch; name=v2-0007-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+127-37
On Sat, 16 Apr 2022 at 01:05, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
Still to-do:
- Validate performance and share the numbers for the same test indexes
in [1]. I'm planning on doing that next Monday.
While working on benchmarking the v2 patchset, I noticed no
improvement on reindex, which I attributed to forgetting to also
specialize comparetup_index_btree in tuplesorth.c. After adding the
specialization there as well (attached in v3), reindex performance
improved significantly too.
Performance results attached in pgbench_log.[master,patched], which
includes the summarized output. Notes on those results:
- single-column reindex seem to have the same performance between
patch and master, within 1% error margins.
- multi-column indexes with useful ->attcacheoff also sees no obvious
performance degradation
- multi-column indexes with no useful ->attcacheoff see significant
insert performance improvement:
-8% runtime on 3 text attributes with default (C) collation ("ccl");
-9% runtime on 1 'en_US'-collated attribute + 2 text attributes
("ccl_collated");
-13% runtime on 1 null + 3 text attributes ("accl");
-74% runtime (!) on 31 'en_US'-collated 0-length text attributes + 1
uuid attribute ("worstcase" - I could not think of a worse index shape
than this one).
- reindex performance gains are much more visible: up to 84% (!) less
time spent for "worstcase", and 18-31% for the other multi-column
indexes mentioned above.
Other notes:
- The dataset I used is the same as I used in [1]/messages/by-id/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg@mail.gmail.com: the pp-complete.csv
as was available on 2021-06-20, containing 26070307 rows.
- The performance was measured on 7 runs of the attached bench script,
using pgbench to measure statement times etc.
- Database was initialized with C locale, all tables are unlogged and
source table was VACUUM (FREEZE, ANALYZE)-ed before starting.
- (!) On HEAD @ 5bb2b6ab, INSERT is faster than REINDEX for the
"worstcase" index. I've not yet discovered why (only lightly looked
into it, no sort debugging), and considering that the issue does not
appear in similar quantities in the patched version, I'm not planning
on putting a lot of time into that.
- Per-transaction results for the run on master were accidentally
deleted, I don't consider them important enough to re-run the
benchmark.
- Decide whether / how to keep the NBTS_ENABLED flag. The current
#define in nbtree.h is a bad example of a compile-time configuration,
that should be changed (even if we only want to be able to disable
specialization at compile-time, it should be moved).Maybe:
- More tests: PG already extensively tests the btree code while it is
running the test suite - btree is the main index AM - but more tests
might be needed to test the validity of the specialized code.
No work on those points yet.
I'll add this to CF 2022-07 for tracking.
Kind regards,
Matthias van de Meent.
[1]: /messages/by-id/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg@mail.gmail.com
Attachments:
performance_comparison.txttext/plain; charset=US-ASCII; name=performance_comparison.txtDownload
v3-0003-Specialize-the-nbtree-rd_indam-entry.patchapplication/x-patch; name=v3-0003-Specialize-the-nbtree-rd_indam-entry.patchDownload+25-1
v3-0001-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/x-patch; name=v3-0001-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3357-2916
v3-0002-Use-specialized-attribute-iterators-in-backend-nb.patchapplication/x-patch; name=v3-0002-Use-specialized-attribute-iterators-in-backend-nb.patchDownload+62-42
v3-0004-Optimize-attribute-iterator-access-for-single-col.patchapplication/x-patch; name=v3-0004-Optimize-attribute-iterator-access-for-single-col.patchDownload+64-2
v3-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchapplication/x-patch; name=v3-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchDownload+99-1
v3-0008-Add-specialization-to-btree-index-creation.patchapplication/x-patch; name=v3-0008-Add-specialization-to-btree-index-creation.patchDownload+175-139
v3-0007-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/x-patch; name=v3-0007-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+127-37
v3-0006-Implement-specialized-uncacheable-attribute-itera.patchapplication/x-patch; name=v3-0006-Implement-specialized-uncacheable-attribute-itera.patchDownload+224-3
On Sun, 5 Jun 2022 at 21:12, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
While working on benchmarking the v2 patchset, I noticed no
improvement on reindex, which I attributed to forgetting to also
specialize comparetup_index_btree in tuplesorth.c. After adding the
specialization there as well (attached in v3), reindex performance
improved significantly too.
PFA version 4 of this patchset. Changes:
- Silence the compiler warnings,
- Extract itup_attiter code into its own header, so that we don't get
compiler warnings and pass the cfbot builds,
- Re-order patches to be in a more logical order,
- Updates to the dynamic prefix compression so that we don't always do
a _bt_compare on the pages' highkey. memcmp(parentpage_rightsep,
highkey) == 0 is often true, and allows us to skip the indirect
function calls in _bt_compare most of the time.
Based on local measurements, this patchset improves performance for
all key shapes, with 2% to 600+% increased throughput (2-86% faster
operations), depending on key shape. As can be seen in the attached
pgbench output, the performance results are based on beta1 (f00a4f02,
dated 2022-06-04) and thus not 100% current, but considering that no
significant changes have been made in the btree AM code since, I
believe these measurements are still quite valid.
I also didn't re-run the numbers for the main branch; but I compared
against the results of master in the last mail. This is because I run
the performance tests locally, and a 7-iteration pgbench run for
master requires 9 hours of downtime with this dataset, during which I
can't use the system so as to not interfere with the performance
tests. As such, I considered rerunning the benchmark for master to be
not worth the time/effort/cost with the little changes that were
committed.
Kind regards,
Matthias van de Meent.
Attachments:
pgbench_log.patched_v4.txttext/plain; charset=US-ASCII; name=pgbench_log.patched_v4.txtDownload
v4-0001-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/octet-stream; name=v4-0001-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3357-2916
v4-0002-Use-specialized-attribute-iterators-in-backend-nb.patchapplication/octet-stream; name=v4-0002-Use-specialized-attribute-iterators-in-backend-nb.patchDownload+62-42
v4-0003-Specialize-the-nbtree-rd_indam-entry.patchapplication/octet-stream; name=v4-0003-Specialize-the-nbtree-rd_indam-entry.patchDownload+25-1
v4-0004-Optimize-attribute-iterator-access-for-single-col.patchapplication/octet-stream; name=v4-0004-Optimize-attribute-iterator-access-for-single-col.patchDownload+71-2
v4-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchapplication/octet-stream; name=v4-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchDownload+99-1
v4-0006-Implement-specialized-uncacheable-attribute-itera.patchapplication/octet-stream; name=v4-0006-Implement-specialized-uncacheable-attribute-itera.patchDownload+244-3
v4-0007-Add-specialization-to-btree-index-creation.patchapplication/octet-stream; name=v4-0007-Add-specialization-to-btree-index-creation.patchDownload+175-139
v4-0008-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/octet-stream; name=v4-0008-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+169-37
On Mon, 4 Jul 2022 at 16:18, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Sun, 5 Jun 2022 at 21:12, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:While working on benchmarking the v2 patchset, I noticed no
improvement on reindex, which I attributed to forgetting to also
specialize comparetup_index_btree in tuplesorth.c. After adding the
specialization there as well (attached in v3), reindex performance
improved significantly too.PFA version 4 of this patchset. Changes:
Version 5 now, which is identical to v4 except for bitrot fixes to
deal with f58d7073.
Kind regards,
Matthias van de Meent.
Attachments:
v5-0002-Use-specialized-attribute-iterators-in-backend-nb.patchapplication/x-patch; name=v5-0002-Use-specialized-attribute-iterators-in-backend-nb.patchDownload+62-42
v5-0004-Optimize-attribute-iterator-access-for-single-col.patchapplication/x-patch; name=v5-0004-Optimize-attribute-iterator-access-for-single-col.patchDownload+71-2
v5-0003-Specialize-the-nbtree-rd_indam-entry.patchapplication/x-patch; name=v5-0003-Specialize-the-nbtree-rd_indam-entry.patchDownload+25-1
v5-0001-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/x-patch; name=v5-0001-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3357-2916
v5-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchapplication/x-patch; name=v5-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchDownload+99-1
v5-0008-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/x-patch; name=v5-0008-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+169-37
v5-0007-Add-specialization-to-btree-index-creation.patchapplication/x-patch; name=v5-0007-Add-specialization-to-btree-index-creation.patchDownload+175-139
v5-0006-Implement-specialized-uncacheable-attribute-itera.patchapplication/x-patch; name=v5-0006-Implement-specialized-uncacheable-attribute-itera.patchDownload+249-4
On Wed, 27 Jul 2022 at 09:35, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Mon, 4 Jul 2022 at 16:18, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:On Sun, 5 Jun 2022 at 21:12, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:While working on benchmarking the v2 patchset, I noticed no
improvement on reindex, which I attributed to forgetting to also
specialize comparetup_index_btree in tuplesorth.c. After adding the
specialization there as well (attached in v3), reindex performance
improved significantly too.PFA version 4 of this patchset. Changes:
Version 5 now, which is identical to v4 except for bitrot fixes to
deal with f58d7073.
... and now v6 to deal with d0b193c0 and co.
I probably should've waited a bit longer this morning and checked
master before sending, but that's not how it went. Sorry for the
noise.
Kind regards,
Matthias van de Meent
Attachments:
v6-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchapplication/octet-stream; name=v6-0005-Add-a-function-whose-task-it-is-to-populate-all-a.patchDownload+99-1
v6-0004-Optimize-attribute-iterator-access-for-single-col.patchapplication/octet-stream; name=v6-0004-Optimize-attribute-iterator-access-for-single-col.patchDownload+71-2
v6-0003-Specialize-the-nbtree-rd_indam-entry.patchapplication/octet-stream; name=v6-0003-Specialize-the-nbtree-rd_indam-entry.patchDownload+25-1
v6-0001-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/octet-stream; name=v6-0001-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3355-2914
v6-0002-Use-specialized-attribute-iterators-in-backend-nb.patchapplication/octet-stream; name=v6-0002-Use-specialized-attribute-iterators-in-backend-nb.patchDownload+62-42
v6-0007-Add-specialization-to-btree-index-creation.patchapplication/octet-stream; name=v6-0007-Add-specialization-to-btree-index-creation.patchDownload+196-141
v6-0006-Implement-specialized-uncacheable-attribute-itera.patchapplication/octet-stream; name=v6-0006-Implement-specialized-uncacheable-attribute-itera.patchDownload+249-4
v6-0008-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/octet-stream; name=v6-0008-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+169-37
On Wed, 27 Jul 2022 at 13:34, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Wed, 27 Jul 2022 at 09:35, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:On Mon, 4 Jul 2022 at 16:18, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:On Sun, 5 Jun 2022 at 21:12, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:While working on benchmarking the v2 patchset, I noticed no
improvement on reindex, which I attributed to forgetting to also
specialize comparetup_index_btree in tuplesorth.c. After adding the
specialization there as well (attached in v3), reindex performance
improved significantly too.PFA version 4 of this patchset. Changes:
Version 5 now, which is identical to v4 except for bitrot fixes to
deal with f58d7073.... and now v6 to deal with d0b193c0 and co.
I probably should've waited a bit longer this morning and checked
master before sending, but that's not how it went. Sorry for the
noise.
Here's the dynamic prefix truncation patch on it's own (this was 0008).
I'll test the performance of this tomorrow, but at least it compiles
and passes check-world against HEAD @ 6e10631d. If performance doesn't
disappoint (isn't measurably worse in known workloads), this will be
the only patch in the patchset - specialization would then be dropped.
Else, tomorrow I'll post the remainder of the patchset that
specializes the nbtree functions on key shape.
Kind regards,
Matthias van de Meent.
Attachments:
v7-0001-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/octet-stream; name=v7-0001-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+188-35
Hi Matthias,
I'm going to look at this patch series if you're still interested. What was the status of your final performance testing for the 0008 patch alone vs the specialization series? Last I saw on the thread you were going to see if the specialization was required or not.
Best,
David
On Thu, 12 Jan 2023 at 16:11, David Christensen <david@pgguru.net> wrote:
Hi Matthias,
I'm going to look at this patch series if you're still interested. What was the status of your final performance testing for the 0008 patch alone vs the specialization series? Last I saw on the thread you were going to see if the specialization was required or not.
Thank you for your interest, and sorry for the delayed response. I've
been working on rebasing and polishing the patches, and hit some
issues benchmarking the set. Attached in Perf_results.xlsx are the
results of my benchmarks, and a new rebased patchset.
TLDR for benchmarks: There may be a small regression 0.5-1% in the
patchset for reindex and insert-based workloads in certain corner
cases, but I can't rule out that it's just a quirk of my testing
setup. Master was taken at eb5ad4ff, and all patches are based on that
as well.
0001 (former 0008) sees 2% performance loss on average on
non-optimized index insertions - this performance loss is fixed with
the rest of the patchset.
The patchset was reordered again: 0001 contains the dynamic prefix
truncation changes; 0002 and 0003 refactor and update btree code to
specialize on key shape, and 0004 and 0006 define the specializations.
0005 is a separated addition to attcacheoff infrastructure that is
useful on it's own; it flags an attribute with 'this attribute cannot
have a cached offset, look at this other attribute instead'.
A significant change from previous versions is that the specialized
function identifiers are published as macros, so `_bt_compare` is
published as a macro that (based on context) calls the specialized
version. This reduced a lot of cognitive overhead and churn in the
code.
Kind regards,
Matthias van de Meent
Attachments:
v8-0001-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/octet-stream; name=v8-0001-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+196-35
v8-0005-Add-an-attcacheoff-populating-function.patchapplication/octet-stream; name=v8-0005-Add-an-attcacheoff-populating-function.patchDownload+113-1
v8-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchapplication/octet-stream; name=v8-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchDownload+102-6
v8-0003-Use-specialized-attribute-iterators-in-the-specia.patchapplication/octet-stream; name=v8-0003-Use-specialized-attribute-iterators-in-the-specia.patchDownload+95-74
v8-0002-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/octet-stream; name=v8-0002-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3609-3136
v8-0006-btree-specialization-for-variable-length-multi-at.patchapplication/octet-stream; name=v8-0006-btree-specialization-for-variable-length-multi-at.patchDownload+259-12
On Fri, 20 Jan 2023 at 20:37, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Thu, 12 Jan 2023 at 16:11, David Christensen <david@pgguru.net> wrote:
Hi Matthias,
I'm going to look at this patch series if you're still interested. What was the status of your final performance testing for the 0008 patch alone vs the specialization series? Last I saw on the thread you were going to see if the specialization was required or not.
Thank you for your interest, and sorry for the delayed response. I've
been working on rebasing and polishing the patches, and hit some
issues benchmarking the set. Attached in Perf_results.xlsx are the
results of my benchmarks, and a new rebased patchset.
Attached v9 which rebases the patchset on b90f0b57 to deal with
compilation errors after d952373a. It also cleans up 0001 which
previously added an unrelated file, but is otherwise unchanged.
Kind regards,
Matthias van de Meent
Attachments:
v9-0005-Add-an-attcacheoff-populating-function.patchapplication/octet-stream; name=v9-0005-Add-an-attcacheoff-populating-function.patchDownload+113-1
v9-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchapplication/octet-stream; name=v9-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchDownload+102-6
v9-0003-Use-specialized-attribute-iterators-in-the-specia.patchapplication/octet-stream; name=v9-0003-Use-specialized-attribute-iterators-in-the-specia.patchDownload+95-74
v9-0001-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/octet-stream; name=v9-0001-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+188-35
v9-0002-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/octet-stream; name=v9-0002-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3609-3136
v9-0006-btree-specialization-for-variable-length-multi-at.patchapplication/octet-stream; name=v9-0006-btree-specialization-for-variable-length-multi-at.patchDownload+260-12
On Mon, 23 Jan 2023 at 14:54, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Fri, 20 Jan 2023 at 20:37, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:On Thu, 12 Jan 2023 at 16:11, David Christensen <david@pgguru.net> wrote:
Last I saw on the thread you were going to see if the specialization was required or not.
Thank you for your interest, and sorry for the delayed response. I've
been working on rebasing and polishing the patches, and hit some
issues benchmarking the set. Attached in Perf_results.xlsx are the
results of my benchmarks, and a new rebased patchset.
Attached v10 which fixes one compile warning, and fixes
headerscheck/cpluspluscheck by adding nbtree_spec.h and
nbtree_specfuncs.h to ignored headers files. It also fixes some cases
of later modifications of earlier patches' code where the change
should be incorporated in the earlier patch instead.
I think this is ready for review, I don't .
The top-level design of the patchset:
0001 modifies btree descent code to use dynamic prefix compression,
i.e. skip comparing columns in binary search when the same column on
tuples on both the left and the right of this tuple have been compared
as 'equal'.
It also includes an optimized path when the downlink's tuple's right
neighbor's data is bytewise equal to the highkey of the page we
descended onto - in those cases we don't need to run _bt_compare on
the index tuple as we know that the result will be the same as that of
the downlink tuple, i.e. it compare as "less than".
NOTE that this patch when applied as stand-alone adds overhead for all
indexes, with the benefits of the patch limited to non-unique indexes
or indexes where the uniqueness is guaranteed only at later
attributes. Later patches in the patchset return performance to a
similar level as before 0001 for the impacted indexes.
0002 creates a scaffold for specializing nbtree functions, and moves
the functions I selected for specialization into separate files. Those
separate files (postfixed with _spec) are included in the original
files through inclusion of the nbtree specialization header file with
a macro variable. The code itself has not materially changed yet at
this point.
0003 updates the functions selected in 0002 to utilize the
specializable attribute iterator macros instead of manual attribute
iteration.
Then, 0004 adds specialization for single-attribute indexes,
0005 adds a helper function for populating attcacheoff (which is
separately useful too, but essential in this patchset), and
0006 adds specialization for multi-column indexes of which the offsets
of the last key column cannot be known.
Kind regards,
Matthias van de Meent.
Attachments:
v10-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchapplication/octet-stream; name=v10-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchDownload+93-3
v10-0005-Add-an-attcacheoff-populating-function.patchapplication/octet-stream; name=v10-0005-Add-an-attcacheoff-populating-function.patchDownload+113-1
v10-0003-Use-specialized-attribute-iterators-in-the-speci.patchapplication/octet-stream; name=v10-0003-Use-specialized-attribute-iterators-in-the-speci.patchDownload+92-67
v10-0001-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/octet-stream; name=v10-0001-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+189-35
v10-0002-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/octet-stream; name=v10-0002-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3613-3134
v10-0006-btree-specialization-for-variable-length-multi-a.patchapplication/octet-stream; name=v10-0006-btree-specialization-for-variable-length-multi-a.patchDownload+258-10
Hm. The cfbot has a fairly trivial issue with this with a unused variable:
18:36:17.405] In file included from ../../src/include/access/nbtree.h:1184,
[18:36:17.405] from verify_nbtree.c:27:
[18:36:17.405] verify_nbtree.c: In function ‘palloc_btree_page’:
[18:36:17.405] ../../src/include/access/nbtree_spec.h:51:23: error:
unused variable ‘__nbts_ctx’ [-Werror=unused-variable]
[18:36:17.405] 51 | #define NBTS_CTX_NAME __nbts_ctx
[18:36:17.405] | ^~~~~~~~~~
[18:36:17.405] ../../src/include/access/nbtree_spec.h:54:43: note: in
expansion of macro ‘NBTS_CTX_NAME’
[18:36:17.405] 54 | #define NBTS_MAKE_CTX(rel) const NBTS_CTX
NBTS_CTX_NAME = _nbt_spec_context(rel)
[18:36:17.405] | ^~~~~~~~~~~~~
[18:36:17.405] ../../src/include/access/nbtree_spec.h:264:28: note: in
expansion of macro ‘NBTS_MAKE_CTX’
[18:36:17.405] 264 | #define nbts_prep_ctx(rel) NBTS_MAKE_CTX(rel)
[18:36:17.405] | ^~~~~~~~~~~~~
[18:36:17.405] verify_nbtree.c:2974:2: note: in expansion of macro
‘nbts_prep_ctx’
[18:36:17.405] 2974 | nbts_prep_ctx(NULL);
[18:36:17.405] | ^~~~~~~~~~~~~
On Tue, 4 Apr 2023 at 17:43, Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote:
Hm. The cfbot has a fairly trivial issue with this with a unused variable:
Attached a rebase on top of HEAD @ 8cca660b to make the patch apply
again. I think this is ready for review once again. As the patchset
has seen no significant changes since v8 of the patchset this january
[0]: /messages/by-id/CAEze2WixWviBYTWXiFLbD3AuLT4oqGk_MykS_ssB=TudeZ=ajQ@mail.gmail.com
Kind regards,
Matthias van de Meent
Neon, Inc.
= Description of the patchset so far:
This patchset implements two features that *taken togetther* improve
the performance of our btree implementation:
== Dynamic prefix truncation (0001)
The code now tracks how many prefix attributes of the scan key are
already considered equal based on earlier binsrch results, and ignores
those prefix colums in further binsrch operations (sorted list; if
both the high and low value of your range have the same prefix, the
middle value will have that prefix, too). This reduces the number of
calls into opclass-supplied (dynamic) compare functions, and thus
increase performance for multi-key-attribute indexes where shared
prefixes are common (e.g. index on (customer, order_id)).
== Index key shape code specialization (0002-0006)
Index tuple attribute accesses for multi-column indexes are often done
through index_getattr, which gets very expensive for indexes which
cannot use attcacheoff. However, single-key and attcacheoff-able
indexes do benefit greatly from the attcacheoff optimization, so we
can't just stop applying the optimization. This is why the second part
of the patchset (0002 and up) adds infrastructure to generate
specialized code paths that access key attributes in the most
efficient way it knows of: Single key attributes do not go through
loops/condtionals for which attribute it is (except certain
exceptions, 0004), attcacheoff-able indexes get the same treatment as
they do now, and indexes where attcacheoff cannot be used for all key
attributes will get a special attribute iterator that incrementally
calculates the offset of each attribute in the current index tuple
(0005+0006).
Although patch 0002 is large, most of the modified lines are functions
being moved into different files. Once 0002 is understood, the
following patches should be fairly easy to understand as well.
= Why both features in one patchset?
The overhead of keeping track of the prefix in 0001 can add up to
several % of performance lost for the common index shape where dynamic
prefix truncation cannot be applied (measured 5-9%); single-column
unique indexes are the most sensitive to this. By adding the
single-key-column code specialization in 0004, we reduce other types
of overhead for the indexes most affected, which thus compensates for
the additional overhead in 0001, resulting in a net-neutral result.
= Benchmarks
I haven't re-run the benchmarks for this since v8 at [0]/messages/by-id/CAEze2WixWviBYTWXiFLbD3AuLT4oqGk_MykS_ssB=TudeZ=ajQ@mail.gmail.com as I haven't
modified the patch significantly after that patch - only compiler
complaint fixes and changes required for rebasing. The results from
that benchmark: improvements vary between 'not significantly different
from HEAD' to '250+% improved throughput for select INSERT workloads,
and 360+% improved for select REINDEX workloads'. Graphs from that
benchmark are also attached now; as LibreOffice Calc wasn't good at
exporting the sheet with working graphs.
[0]: /messages/by-id/CAEze2WixWviBYTWXiFLbD3AuLT4oqGk_MykS_ssB=TudeZ=ajQ@mail.gmail.com
Attachments:
v11-0006-btree-specialization-for-variable-length-multi-a.patchapplication/octet-stream; name=v11-0006-btree-specialization-for-variable-length-multi-a.patchDownload+258-10
v11-0003-Use-specialized-attribute-iterators-in-the-speci.patchapplication/octet-stream; name=v11-0003-Use-specialized-attribute-iterators-in-the-speci.patchDownload+92-67
v11-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchapplication/octet-stream; name=v11-0004-Optimize-nbts_attiter-for-nkeyatts-1-btrees.patchDownload+93-3
v11-0005-Add-an-attcacheoff-populating-function.patchapplication/octet-stream; name=v11-0005-Add-an-attcacheoff-populating-function.patchDownload+113-1
v11-0001-Implement-dynamic-prefix-compression-in-nbtree.patchapplication/octet-stream; name=v11-0001-Implement-dynamic-prefix-compression-in-nbtree.patchDownload+187-35
v11-0002-Specialize-nbtree-functions-on-btree-key-shape.patchapplication/octet-stream; name=v11-0002-Specialize-nbtree-functions-on-btree-key-shape.patchDownload+3625-3151
On Fri, Jun 23, 2023 at 2:21 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
== Dynamic prefix truncation (0001)
The code now tracks how many prefix attributes of the scan key are
already considered equal based on earlier binsrch results, and ignores
those prefix colums in further binsrch operations (sorted list; if
both the high and low value of your range have the same prefix, the
middle value will have that prefix, too). This reduces the number of
calls into opclass-supplied (dynamic) compare functions, and thus
increase performance for multi-key-attribute indexes where shared
prefixes are common (e.g. index on (customer, order_id)).
I think the idea looks good to me.
I was looking into the 0001 patches, and I have one confusion in the
below hunk in the _bt_moveright function, basically, if the parent
page's right key is exactly matching the HIGH key of the child key
then I suppose while doing the "_bt_compare" with the HIGH_KEY we can
use the optimization right, i.e. column number from where we need to
start the comparison should be used what is passed by the caller. But
in the below hunk, you are always passing that as 'cmpcol' which is 1.
I think this should be '*comparecol' because '*comparecol' will either
hold the value passed by the parent if high key data exactly match
with the parent's right tuple or it will hold 1 in case it doesn't
match. Am I missing something?
@@ -247,13 +256,16 @@ _bt_moveright(Relation rel,
{
....
+ if (P_IGNORE(opaque) ||
+ _bt_compare(rel, key, page, P_HIKEY, &cmpcol) >= cmpval)
+ {
+ *comparecol = 1;
}
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, 23 Jun 2023 at 11:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Jun 23, 2023 at 2:21 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:== Dynamic prefix truncation (0001)
The code now tracks how many prefix attributes of the scan key are
already considered equal based on earlier binsrch results, and ignores
those prefix colums in further binsrch operations (sorted list; if
both the high and low value of your range have the same prefix, the
middle value will have that prefix, too). This reduces the number of
calls into opclass-supplied (dynamic) compare functions, and thus
increase performance for multi-key-attribute indexes where shared
prefixes are common (e.g. index on (customer, order_id)).I think the idea looks good to me.
I was looking into the 0001 patches,
Thanks for reviewing.
and I have one confusion in the
below hunk in the _bt_moveright function, basically, if the parent
page's right key is exactly matching the HIGH key of the child key
then I suppose while doing the "_bt_compare" with the HIGH_KEY we can
use the optimization right, i.e. column number from where we need to
start the comparison should be used what is passed by the caller. But
in the below hunk, you are always passing that as 'cmpcol' which is 1.
I think this should be '*comparecol' because '*comparecol' will either
hold the value passed by the parent if high key data exactly match
with the parent's right tuple or it will hold 1 in case it doesn't
match. Am I missing something?
We can't carry _bt_compare prefix results across pages, because the
key range of a page may shift while we're not holding a lock on that
page. That's also why the code resets the prefix to 1 every time it
accesses a new page ^1: it cannot guarantee correct results otherwise.
See also [0]/messages/by-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com and [1] for why that is important.
^1: When following downlinks, the code above your quoted code tries to
reuse the _bt_compare result of the parent page in the common case of
a child page's high key that is bytewise equal to the right separator
tuple of the parent page's downlink to this page. However, if it
detects that this child high key has changed (i.e. not 100% bytewise
equal), we can't reuse that result, and we'll have to re-establish all
prefix info on that page from scratch.
In any case, this only establishes the prefix for the right half of
the page's keyspace, the prefix of the left half of the data still
needs to be established separetely.
I hope this explains the reasons for why we can't reuse comparecol as
_bt_compare argument.
Kind regards,
Matthias van de Meent
Neon, Inc.
[0]: /messages/by-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com
On Fri, Jun 23, 2023 at 8:16 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Fri, 23 Jun 2023 at 11:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:
and I have one confusion in the
below hunk in the _bt_moveright function, basically, if the parent
page's right key is exactly matching the HIGH key of the child key
then I suppose while doing the "_bt_compare" with the HIGH_KEY we can
use the optimization right, i.e. column number from where we need to
start the comparison should be used what is passed by the caller. But
in the below hunk, you are always passing that as 'cmpcol' which is 1.
I think this should be '*comparecol' because '*comparecol' will either
hold the value passed by the parent if high key data exactly match
with the parent's right tuple or it will hold 1 in case it doesn't
match. Am I missing something?We can't carry _bt_compare prefix results across pages, because the
key range of a page may shift while we're not holding a lock on that
page. That's also why the code resets the prefix to 1 every time it
accesses a new page ^1: it cannot guarantee correct results otherwise.
See also [0] and [1] for why that is important.
Yeah that makes sense
^1: When following downlinks, the code above your quoted code tries to
reuse the _bt_compare result of the parent page in the common case of
a child page's high key that is bytewise equal to the right separator
tuple of the parent page's downlink to this page. However, if it
detects that this child high key has changed (i.e. not 100% bytewise
equal), we can't reuse that result, and we'll have to re-establish all
prefix info on that page from scratch.
In any case, this only establishes the prefix for the right half of
the page's keyspace, the prefix of the left half of the data still
needs to be established separetely.I hope this explains the reasons for why we can't reuse comparecol as
_bt_compare argument.
Yeah got it, thanks for explaining this. Now I see you have explained
this in comments as well above the memcmp() statement.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com