Adding support for Row Compares to nbtree startikey optimization

Started by Peter Geoghegan10 months ago5 messageshackers
Jump to latest

Postgres 18 commit 8a510275 taught nbtree to apply page-level context
to determine an offset into the so->keyData[] array
(pstate.ikey/startikey) that works for the duration of a call to
_bt_readpage. Affected _bt_readpage calls have _bt_checkkeys calls
that use fewer CPU cycles, since all the _bt_checkkeys calls will be
able to start from the scan key/offset.

Postgres 18 never got support for row compare keys here:
_bt_set_startikey isn't willing to continue past a row compare key.
This seems like a missed opportunity. Row compares are generally a
decent target for the optimization in practice. They're at least as
good a target as scalar inequality keys were.

Attached patch remedies the situation by teaching _bt_set_startikey
about row compare members. It's reasonably straightforward, and makes
_bt_set_startikey "feature complete".

Testing has shown that it's about as effective as you'd expect -- it
generally manages to avoid evaluating row compares on scans that need
to read more than a couple of leaf pages. For example, if I run the
following test query:

select a, b, c
from fuzz_skip_scan
where (a, b, c) >= (11, 1, 71) and (a, b, c) <= (18, 1, 1)
order by a desc, b desc, c desc, d desc;

Custom instrumentation shows that most individual pages will be able
to skip/avoid evaluating *both* row compares, on *most* individual
pages. The first page read will never call _bt_set_startikey to begin
with (nothing new about that), and the last page will only be able to
skip one out of the two keys. That seems worth having.

The only notable limitation is that the optimization is ineffective on
pages where the deciding attribute (the attribute that most/all calls
to _bt_check_rowcompare end on) happens to contain NULLs. This is a
natural consequence of how row compares work: NULLs make it possible
for a tuple to be (say) ">=" a row compare qual, without the tuple
satisfying the qual, *and* without the NULLs indicating the absolute
end of matching tuples in the current scan direction. This is possible
with lower-order row compare member keys, at least.

I'll submit this to the next open commitfest for Postgres 19.
--
Peter Geoghegan

Attachments:

v1-0001-Teach-nbtree-to-set-pstate.ikey-beyond-a-row-comp.patchapplication/octet-stream; name=v1-0001-Teach-nbtree-to-set-pstate.ikey-beyond-a-row-comp.patchDownload+134-33
#2Chao Li
li.evan.chao@gmail.com
In reply to: Peter Geoghegan (#1)
Re: Adding support for Row Compares to nbtree startikey optimization

I tested the patch. Basically all of my test cases passed. Just a few small comments:

+                               if (subkey->sk_attno > firstchangingattnum) /* >, not >= */
+                                       break;

Can we enhance the comment to something like ">, not >= because firstchangingattnum starting from 1".

+                               /*
+                                * Compare the last tuple's datum for this row compare member
+                                * (unless we already know that lasttup must satisfy qual)
+                                */
+                               if (!lastsatisfied)
+                               {

When subkey->sk_attno <= firstchangingattnum, if first is not satisfied, then last should not be satisfied either, so we can skip check last.

The new status of this patch is: Waiting on Author

#3Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#2)
Re: Adding support for Row Compares to nbtree startikey optimization

On Aug 15, 2025, at 14:37, Chao Li <li.evan.chao@gmail.com> wrote:

When subkey->sk_attno <= firstchangingattnum, if first is not satisfied, then last should not be satisfied either, so we can skip check last.

Sorry for a typo, "subkey->sk_attno <= firstchangingattnum” should be "subkey->sk_attno < firstchangingattnum”. It should be “<“ but “<=“.

Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/

In reply to: Chao Li (#2)
Re: Adding support for Row Compares to nbtree startikey optimization

On Fri, Aug 15, 2025 at 2:38 AM Chao Li <li.evan.chao@gmail.com> wrote:

I tested the patch. Basically all of my test cases passed.

Thanks for the review.

+                               if (subkey->sk_attno > firstchangingattnum) /* >, not >= */
+                                       break;

Can we enhance the comment to something like ">, not >= because firstchangingattnum starting from 1".

But that's not the reason why the condition is ">, not >=". The
condition really is ">=" later in the same function, when we deal with
= strategy keys. If you think about how equalities and inequalities
need to work with firstchangingattnum, you'll understand why this is
(note that row compares are always inequalities, and have almost the
same rules as plain scalar inequalities). I'll give you a summary, in
case that helps.

With equalities, the index attribute for the key must have exactly one
distinct value across the whole page. They must *also* be preceded by
0 or more higher-order attributes that also have only one distinct
value among tuples from the page. Otherwise, we cannot use firsttup
and lasttup to prove that the equality must be satisfied by every
tuple on the page. That's why we must break on ">=".

With inequalities, the rules are less strict: we *can* break only on
">". It is okay if there are multiple values among the tuples on the
page in respect of the attribute for the key/subkey (provided that the
attribute is firstchangingattnum, not some later attribute) -- as long
as firsttup and lasttup are both satisfied by the inequality.

Notice that with equalities (but not inequalities) it isn't possible
for firsttup and lasttup to both be satisfied when the equality key's
attribute is firstchangingattnum -- since there are multiple distinct
values for the attribute, and an = condition can only be equal to one
specific value (never multiple values). This is even true with =
ANY()/ScalarArrayOp conditions, since there is no way to know that all
the index tuples have matches in the array when "attr ==
firstchangingattnum" (we really do have to read every tuple from the
page to give correct results for a = ANY()/ScalarArrayOp key).

+                               /*
+                                * Compare the last tuple's datum for this row compare member
+                                * (unless we already know that lasttup must satisfy qual)
+                                */
+                               if (!lastsatisfied)
+                               {

When subkey->sk_attno <= firstchangingattnum, if first is not satisfied, then last should not be satisfied either, so we can skip check last.

That's not quite true: when "subkey->sk_attno == firstchangingattnum",
it's possible for the subkey to be satisfied by only one of the two
tuples (either firsttup or lasttup) -- so we really do need to check
both.

I believe that a weaker version of this idea is correct, though: we
could avoid directly testing the subkey against lasttup when
"subkey->sk_attno < firstchangingattnum", provided we make sure that
at least firsttup's corresponding datum (firstdatum) satisfies the
subkey. This must be correct because the exact same datum will be used
against the same subkey a second time -- the answer cannot possibly
differ between firsttup and lasttup. I don't think that that
optimization is particularly worthwhile here, though; it adds
additional complexity, for a saving that is hardly noticeable at all.

The value of this optimization comes from the fact that we can save
hundreds of calls to _bt_check_rowcompare for each tuple on the page
(when it's safe to set pstate.ikey to a value that's past some
RowCompare key's offset in so->keyData[]). A minor, rare saving in
"startup costs" just doesn't seem worth the trouble to me.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#4)
Re: Adding support for Row Compares to nbtree startikey optimization

On Sun, Sep 14, 2025 at 5:47 PM Peter Geoghegan <pg@bowt.ie> wrote:

Thanks for the review.

Attached is v2, which is just v1 with some extra polish, and a real
commit message.

I plan on committing this in the next couple of days.

Thanks
--
Peter Geoghegan

Attachments:

v2-0001-Teach-nbtree-to-avoid-evaluating-row-compare-keys.patchapplication/octet-stream; name=v2-0001-Teach-nbtree-to-avoid-evaluating-row-compare-keys.patchDownload+133-33