Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Started by Peter Geogheganover 2 years ago31 messageshackers
Jump to latest

As I recently went into on the thread where we've been discussing my
nbtree SAOP patch [1]/messages/by-id/CAH2-Wz=BuxYEHxpNH0tPvo=+G1WtE1PamRoVU1dEVow1Vy9Y7A@mail.gmail.com -- Peter Geoghegan, there is good reason to suspect that one of the
optimizations added by commit e0b1ee17 is buggy in the presence of an
opfamily lacking the full set of cross-type comparisons. The attached
test case confirms that these suspicions were correct. Running the
tese case against HEAD will lead to an assertion failure (or a wrong
answer when assertions are disabled).

To recap, the optimization in question (which is not to be confused
with the "precheck" optimization from the same commit) is based on the
idea that _bt_first must always land the scan ahead of the position
that the scan would end on, were the scan direction to change (from
forwards to backwards, say). It follows that inequality strategy scan
keys that are required in the opposite-to-scan direction *only* must
be redundant in the current scan direction (in the sense that
_bt_checkkeys needn't bother comparing them at all). Unfortunately,
that rationale is at least slightly wrong.

Although some version of the same assumption must really hold in the
case of required equality strategy scan keys (old comments in
_bt_checkkeys and in _bt_first say as much), it isn't really
guaranteed in the case of inequalities. In fact, the following
sentence appears in old comments above _bt_preprocess_keys, directly
contradicting the theory behind the optimization in question:

"In general, when inequality keys are present, the initial-positioning
code only promises to position before the first possible match, not
exactly at the first match, for a forward scan; or after the last
match for a backward scan."

My test case mostly just demonstrates how to reproduce the scenario
described by this sentence.

It's probably possible to salvage the optimization, but that will
require bookkeeping sufficient to detect these unsafe cases, so that
_bt_checkkeys only skips the comparisons when it's truly safe. As far
as I know, the only reason that inequalities differ from equalities is
this respect is the issue that the test case highlights. (There were
also issues with NULLs, but AFAICT Alexander dealt with that aspect of
the problem already.)

[1]: /messages/by-id/CAH2-Wz=BuxYEHxpNH0tPvo=+G1WtE1PamRoVU1dEVow1Vy9Y7A@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan

Attachments:

require_opposite_dir_repro.sqlapplication/octet-stream; name=require_opposite_dir_repro.sqlDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#1)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Peter Geoghegan <pg@bowt.ie> writes:

As I recently went into on the thread where we've been discussing my
nbtree SAOP patch [1], there is good reason to suspect that one of the
optimizations added by commit e0b1ee17 is buggy in the presence of an
opfamily lacking the full set of cross-type comparisons. The attached
test case confirms that these suspicions were correct. Running the
tese case against HEAD will lead to an assertion failure (or a wrong
answer when assertions are disabled).

Hmm ... I had not paid any attention to this commit, but the rationale
given in the commit message is just flat wrong:

Imagine the ordered B-tree scan for the query like this.

SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col;

The (col > 'a') scan key will be always matched once we find the location to
start the scan. The (col < 'b') scan key will match every item on the page
as long as it matches the last item on the page.

That argument probably holds for the index's first column, but it is
completely and obviously wrong for every following column. Nonetheless
it looks like we're trying to apply the optimization to every scan key.

regards, tom lane

In reply to: Tom Lane (#2)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Tue, Dec 5, 2023 at 4:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... I had not paid any attention to this commit, but the rationale
given in the commit message is just flat wrong:

Imagine the ordered B-tree scan for the query like this.

SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col;

The (col > 'a') scan key will be always matched once we find the location to
start the scan. The (col < 'b') scan key will match every item on the page
as long as it matches the last item on the page.

That argument probably holds for the index's first column, but it is
completely and obviously wrong for every following column. Nonetheless
it looks like we're trying to apply the optimization to every scan key.

Just to be clear, you're raising a concern that seems to me to apply
to "the other optimization" from the same commit, specifically -- the
precheck optimization. Not the one I found a problem in. (They're
closely related but distinct optimizations.)

I *think* that that part is handled correctly, because non-required
scan keys are not affected (by either optimization). I have no
specific reason to doubt the proposition that 'b' could only be marked
required in situations where it is indeed safe to assume that the col
< 'b' condition must also apply to earlier tuples transitively (i.e.
this must be true because it was true for the the final tuple on the
page during the _bt_readpage precheck).

That being said, I wouldn't rule out problems for the precheck
optimization in the presence of opfamilies like the one from my test
case. I didn't get as far as exploring that side of things, at least.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#1)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

"In general, when inequality keys are present, the initial-positioning
code only promises to position before the first possible match, not
exactly at the first match, for a forward scan; or after the last
match for a backward scan."

My test case mostly just demonstrates how to reproduce the scenario
described by this sentence.

I just realized that my test case wasn't quite minimized correctly. It
depended on a custom function that was no longer created.

Attached is a revised version that uses btint84cmp instead.

--
Peter Geoghegan

Attachments:

require_opposite_dir_repro_v2.sqlapplication/octet-stream; name=require_opposite_dir_repro_v2.sqlDownload
#5Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Geoghegan (#4)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Hi, Peter!

On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

"In general, when inequality keys are present, the initial-positioning
code only promises to position before the first possible match, not
exactly at the first match, for a forward scan; or after the last
match for a backward scan."

My test case mostly just demonstrates how to reproduce the scenario
described by this sentence.

I just realized that my test case wasn't quite minimized correctly. It
depended on a custom function that was no longer created.

Attached is a revised version that uses btint84cmp instead.

Thank you for raising this issue. Preprocessing of btree scan keys is
normally removing the redundant scan keys. However, redundant scan
keys aren't removed when they have arguments of different types.
Please give me a bit of time to figure out how to workaround this.

------
Regards,
Alexander Korotkov

In reply to: Alexander Korotkov (#5)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Thank you for raising this issue. Preprocessing of btree scan keys is
normally removing the redundant scan keys. However, redundant scan
keys aren't removed when they have arguments of different types.
Please give me a bit of time to figure out how to workaround this.

Couldn't you condition the use of the optimization on
_bt_preprocess_keys being able to use cross-type operators when it
checked for redundant or contradictory scan keys? The vast majority of
index scans will be able to do that.

As I said already, what you're doing here isn't all that different to
the way that we rely on required equality strategy scan keys being
used to build our initial insertion scan key, that determines where
the scan is initially positioned to within _bt_first. Inequalities
aren't all that different to equalities.

--
Peter Geoghegan

#7Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#3)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan <pg@bowt.ie> wrote:

Just to be clear, you're raising a concern that seems to me to apply
to "the other optimization" from the same commit, specifically -- the
precheck optimization. Not the one I found a problem in. (They're
closely related but distinct optimizations.)

It isn't very clear from the commit message that this commit is doing
two different things, and in fact I'm still unclear on what exactly
the other optimization is.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Robert Haas (#7)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, 6 Dec 2023 at 14:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan <pg@bowt.ie> wrote:

Just to be clear, you're raising a concern that seems to me to apply
to "the other optimization" from the same commit, specifically -- the
precheck optimization. Not the one I found a problem in. (They're
closely related but distinct optimizations.)

It isn't very clear from the commit message that this commit is doing
two different things, and in fact I'm still unclear on what exactly
the other optimization is.

I feel that Peter refered to these two distinct optimizations:

1. When scanning an index in ascending order using scankey a > 1 (so,
one that defines a start point of the scan), we don't need to check
items for consistency with that scankey once we've found the first
value that is consistent with the scankey, as all future values will
also be consistent with the scankey (if we assume no concurrent page
deletions).

2. When scanning an index in ascending order using scankey a < 10 (one
that defines an endpoint of the scan), we can look ahead and check if
the last item on the page is consistent. If so, then all other items
on the page will also be consistent with that scankey.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#9Robert Haas
robertmhaas@gmail.com
In reply to: Matthias van de Meent (#8)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 8:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I feel that Peter refered to these two distinct optimizations:

1. When scanning an index in ascending order using scankey a > 1 (so,
one that defines a start point of the scan), we don't need to check
items for consistency with that scankey once we've found the first
value that is consistent with the scankey, as all future values will
also be consistent with the scankey (if we assume no concurrent page
deletions).

2. When scanning an index in ascending order using scankey a < 10 (one
that defines an endpoint of the scan), we can look ahead and check if
the last item on the page is consistent. If so, then all other items
on the page will also be consistent with that scankey.

Oh, interesting. Thanks.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Matthias van de Meent (#8)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Wed, 6 Dec 2023 at 14:11, Robert Haas <robertmhaas@gmail.com> wrote:

It isn't very clear from the commit message that this commit is doing
two different things, and in fact I'm still unclear on what exactly
the other optimization is.

I feel that Peter refered to these two distinct optimizations:

Right.

2. When scanning an index in ascending order using scankey a < 10 (one
that defines an endpoint of the scan), we can look ahead and check if
the last item on the page is consistent. If so, then all other items
on the page will also be consistent with that scankey.

Also worth noting that it could be "scankey a = 10". That is, the
precheck optimization (i.e. the optimization that's not the target of
my test case) can deal with equalities and inequalities just as well
(any scan key that's required in the current scan direction is
supported). On the other hand the required-in-opposite-direction-only
optimization (i.e. the target of my test case) only applies to
inequality strategy scan keys.

It kinda makes sense to explain both concepts using an example that
involves both > and < strategy inequalities, since that makes the
symmetry between the two optimizations clearer.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#6)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Tue, Dec 5, 2023 at 8:20 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Thank you for raising this issue. Preprocessing of btree scan keys is
normally removing the redundant scan keys. However, redundant scan
keys aren't removed when they have arguments of different types.
Please give me a bit of time to figure out how to workaround this.

Couldn't you condition the use of the optimization on
_bt_preprocess_keys being able to use cross-type operators when it
checked for redundant or contradictory scan keys? The vast majority of
index scans will be able to do that.

Some quick experimentation shows that my test case works as expected
once _bt_preprocess_keys is taught to remember that it has seen a
maybe-unsafe case, which it stashes in a special new field from the
scan's state for later. As I said, this field can be treated as a
condition of applying the required-in-opposite-direction-only
optimization in _bt_readpage().

This new field would be analogous to the existing
requiredMatchedByPrecheck state used by _bt_readpage() to determine
whether it can apply the required-in-same-direction optimization. The
new field works for the whole scan instead of just for one page, and
it works based on information from "behind the scan" instead of
information "just ahead of the scan". But the basic idea is the same.

_bt_preprocess_keys is rather complicated. It is perhaps tempting to
do this in a targeted way, that specifically limits itself to the exact
cases that we know to be unsafe. But it might be okay to just disable
the optimization in most or all cases where _bt_compare_scankey_args()
returns false. That's likely to be very rare in practice, anyway (who
really uses opfamilies like these?). Not really sure where to come
down on that.

--
Peter Geoghegan

In reply to: Matthias van de Meent (#8)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

1. When scanning an index in ascending order using scankey a > 1 (so,
one that defines a start point of the scan), we don't need to check
items for consistency with that scankey once we've found the first
value that is consistent with the scankey, as all future values will
also be consistent with the scankey (if we assume no concurrent page
deletions).

BTW, I don't think that page deletion is a concern for these
optimizations in the way that it is for the similar idea of "dynamic
prefix compression", which works against insertion-type scan keys
(used to descend the tree and to do an initial binary search of a leaf
page).

We already rely on the first call to _bt_readpage (the one that takes
place from within _bt_first rather than from _bt_next) passing a page
offset number that's exactly at the start of where matches begin --
this is crucial in the case of scans with required equality strategy
scan keys (most scans). If we just skipped the _bt_binsrch and passed
P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that
would break lots of queries. So the _bt_binsrch step within _bt_first
isn't just an optimization -- it's crucial. This is nothing new.

Recall that _bt_readpage only deals with search-type scan keys,
meaning scan keys that use a simple operator (so it uses = operators
with the equality strategy, as opposed to using a 3-way ORDER
proc/support function 1 that can tell the difference between < and >).
In general _bt_readpage doesn't know how to tell the difference
between a tuple that's before the start of = matches, and a tuple
that's at (or after) the end of any = matches. If it is ever allowed
to conflate these two cases, then we'll overlook matching tuples,
which is of course wrong (it'll terminate the scan before it even
starts). It is up to the caller (really just _bt_first) to never call
_bt_readpage in a way that allows this confusion to take place --
which is what makes the _bt_binsrch step crucial.

A race condition with page deletion might allow the key space covered
by a leaf page to "widen" after we've left its parent, but before we
arrive on the leaf page. But the _bt_binsrch step within _bt_first
happens *after* we land on and lock that leaf page, in any case. So
there is no risk of the scan ever doing anything with
concurrently-inserted index tuples. In general we only have to worry
about such race conditions when descending the tree -- they're not a
problem after the scan has reached the leaf level and established an
initial page offset number. (The way that _bt_readpage processes whole
pages in one atomic step helps with this sort of thing, too. We can
almost pretend that the B-Tree structure is immutable, even though
that's obviously not really true at all. We know that we'll always
make forward progress through the key space by remembering the next
page to visit when processing each page.)

My test case broke the required-in-opposite-direction-only
optimization by finding a way in which
required-in-opposite-direction-only inequalities were not quite the
same as required equalities with respect to this business about the
precise leaf page offset number that the scan begins at. They make
*almost* the same set of guarantees (note in particular that both will
be transformed into insertion scan key/3-way ORDER proc scan keys by
_bt_first's initial positioning code), but there is at least one
special case that applies only to inequalities. I had to play games
with weird incomplete opfamilies to actually break the optimization --
that was required to tickle the special case in just the right/wrong
way.

--
Peter Geoghegan

#13Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#12)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, 6 Dec 2023 at 19:55, Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

1. When scanning an index in ascending order using scankey a > 1 (so,
one that defines a start point of the scan), we don't need to check
items for consistency with that scankey once we've found the first
value that is consistent with the scankey, as all future values will
also be consistent with the scankey (if we assume no concurrent page
deletions).

BTW, I don't think that page deletion is a concern for these
optimizations in the way that it is for the similar idea of "dynamic
prefix compression", which works against insertion-type scan keys
(used to descend the tree and to do an initial binary search of a leaf
page).

We already rely on the first call to _bt_readpage (the one that takes
place from within _bt_first rather than from _bt_next) passing a page
offset number that's exactly at the start of where matches begin --
this is crucial in the case of scans with required equality strategy
scan keys (most scans). If we just skipped the _bt_binsrch and passed
P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that
would break lots of queries. So the _bt_binsrch step within _bt_first
isn't just an optimization -- it's crucial. This is nothing new.

I was thinking more along the lines of page splits+deletions while
we're doing _bt_stepright(), but forgot to consider that we first lock
the right sibling, and only then release the left sibling for splits,
so we should be fine here.

Kind regards,

Matthias van de Meent

In reply to: Matthias van de Meent (#13)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 11:14 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I was thinking more along the lines of page splits+deletions while
we're doing _bt_stepright(), but forgot to consider that we first lock
the right sibling, and only then release the left sibling for splits,
so we should be fine here.

In general the simplest (and possibly most convincing) arguments for
the correctness of optimizations like the ones that Alexander added
rely on seeing that the only way that the optimization can be wrong is
if some more fundamental and long established thing was also wrong. We
could try to prove that the new optimization is correct (or wrong),
but it is often more helpful to "prove" that some much more
fundamental thing is correct instead, if that provides us with a
useful corollary about the new thing also being correct.

Take the _bt_readpage precheck optimization, for example. Rather than
thinking about the key space and transitive rules, it might be more
helpful to focus on what must have been true in earlier Postgres
versions, and what we can expect to still hold now. The only way that
that optimization could be wrong is if the same old _bt_checkkeys
logic that decides when to terminate the scan (by setting
continuescan=false) always had some propensity to "change its mind"
about ending the scan, at least when it somehow got the opportunity to
see tuples after the first tuple that it indicated should end the
scan. That's not quite bulletproof, of course (it's not like older
Postgres versions actually provided _bt_checkkeys with opportunities
to "change its mind" in this sense), but it's a useful starting point
IME. It helps to build intuition.

--
Peter Geoghegan

#15Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#5)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

"In general, when inequality keys are present, the initial-positioning
code only promises to position before the first possible match, not
exactly at the first match, for a forward scan; or after the last
match for a backward scan."

My test case mostly just demonstrates how to reproduce the scenario
described by this sentence.

I just realized that my test case wasn't quite minimized correctly. It
depended on a custom function that was no longer created.

Attached is a revised version that uses btint84cmp instead.

Thank you for raising this issue. Preprocessing of btree scan keys is
normally removing the redundant scan keys. However, redundant scan
keys aren't removed when they have arguments of different types.
Please give me a bit of time to figure out how to workaround this.

I dig into the problem. I think this assumption is wrong in my commit.

"When the key is required for opposite direction scan, it must be
already satisfied by_bt_first() ..."

In your example "foo = 90" is satisfied by_bt_first(), but "foo >
99::int8" is not. I think this could be resolved by introducing a
separate flag exactly distinguishing scan keys used for _bt_first().
I'm going to post the patch doing this.

------
Regards,
Alexander Korotkov

#16Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#15)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Fri, Dec 8, 2023 at 8:30 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Dec 6, 2023 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

"In general, when inequality keys are present, the initial-positioning
code only promises to position before the first possible match, not
exactly at the first match, for a forward scan; or after the last
match for a backward scan."

My test case mostly just demonstrates how to reproduce the scenario
described by this sentence.

I just realized that my test case wasn't quite minimized correctly. It
depended on a custom function that was no longer created.

Attached is a revised version that uses btint84cmp instead.

Thank you for raising this issue. Preprocessing of btree scan keys is
normally removing the redundant scan keys. However, redundant scan
keys aren't removed when they have arguments of different types.
Please give me a bit of time to figure out how to workaround this.

I dig into the problem. I think this assumption is wrong in my commit.

"When the key is required for opposite direction scan, it must be
already satisfied by_bt_first() ..."

In your example "foo = 90" is satisfied by_bt_first(), but "foo >
99::int8" is not. I think this could be resolved by introducing a
separate flag exactly distinguishing scan keys used for _bt_first().
I'm going to post the patch doing this.

The draft patch is attached. It requires polishing and proper
commenting. But I hope the basic idea is clear. Do you think this is
the way forward?

------
Regards,
Alexander Korotkov

Attachments:

fix_btree_skip_scan_keys_optimization-v1.patchapplication/octet-stream; name=fix_btree_skip_scan_keys_optimization-v1.patchDownload+9-6
In reply to: Alexander Korotkov (#16)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Fri, Dec 8, 2023 at 10:46 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

In your example "foo = 90" is satisfied by_bt_first(), but "foo >
99::int8" is not. I think this could be resolved by introducing a
separate flag exactly distinguishing scan keys used for _bt_first().
I'm going to post the patch doing this.

The draft patch is attached. It requires polishing and proper
commenting. But I hope the basic idea is clear. Do you think this is
the way forward?

Does this really need to work at the scan key level, rather than at
the whole-scan level? Wouldn't it make more sense to just totally
disable it for the whole scan, since we'll barely ever need to do that
anyway?

My ScalarArrayOpExpr patch will need to disable this optimization,
since with that patch in place we don't necessarily go through
_bt_first each time the search-type scan keys must change. We might
need to check a few tuples from before the _bt_first-wise position of
the next set of array values, which is a problem with
opposite-direction-only inequalities (it's a little bit like the
situation from my test case, actually). That's partly why I'd prefer
this to work at the whole-scan level (though I also just don't think
that inventing SK_BT_BT_FIRST makes much sense).

I think that you should make it clearer that this whole optimization
only applies to required *inequalities*, which can be required in the
opposite direction *only*. It should be more obvious from looking at
the code that the optimization doesn't apply to required equality
strategy scan keys (those are always required in *both* scan
directions or in neither direction, so unlike the closely related
prefix optimization added by the same commit, they just can't use the
optimization that we need to fix here).

BTW, do we really need to keep around the BTScanOpaqueData.firstPage
field? Why can't the call to _bt_readpage from _bt_first (and from
_bt_endpoint) just pass "firstPage=true" as a simple argument? Note
that the first call to _bt_readpage must take place from _bt_first (or
from _bt_endpoint). The first _bt_first call is already kind of
special, in a way that is directly related to this issue. I added some
comments about that to today's commit c9c0589fda, in fact -- I think
it's an important issue in general.

--
Peter Geoghegan

#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Geoghegan (#17)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Hi, Peter!

On Sat, Dec 9, 2023 at 4:29 AM Peter Geoghegan <pg@bowt.ie> wrote:

Does this really need to work at the scan key level, rather than at
the whole-scan level? Wouldn't it make more sense to just totally
disable it for the whole scan, since we'll barely ever need to do that
anyway?

My ScalarArrayOpExpr patch will need to disable this optimization,
since with that patch in place we don't necessarily go through
_bt_first each time the search-type scan keys must change. We might
need to check a few tuples from before the _bt_first-wise position of
the next set of array values, which is a problem with
opposite-direction-only inequalities (it's a little bit like the
situation from my test case, actually). That's partly why I'd prefer
this to work at the whole-scan level (though I also just don't think
that inventing SK_BT_BT_FIRST makes much sense).

I think that you should make it clearer that this whole optimization
only applies to required *inequalities*, which can be required in the
opposite direction *only*. It should be more obvious from looking at
the code that the optimization doesn't apply to required equality
strategy scan keys (those are always required in *both* scan
directions or in neither direction, so unlike the closely related
prefix optimization added by the same commit, they just can't use the
optimization that we need to fix here).

BTW, do we really need to keep around the BTScanOpaqueData.firstPage
field? Why can't the call to _bt_readpage from _bt_first (and from
_bt_endpoint) just pass "firstPage=true" as a simple argument? Note
that the first call to _bt_readpage must take place from _bt_first (or
from _bt_endpoint). The first _bt_first call is already kind of
special, in a way that is directly related to this issue. I added some
comments about that to today's commit c9c0589fda, in fact -- I think
it's an important issue in general.

Please, check the attached patchset.

The first patch removes the BTScanOpaqueData.firstPage field as you
proposed. I think this is a good idea, thank you for the proposal.

Regarding the requiredOppositeDir bug. I don't want to lose the
generality of the optimization. I could work for different cases, for
example.
WHERE col1 > val1 AND col1 < val2
WHERE col1 = val1 AND col2 > val2 AND col2 < val3
WHERE col1 = val1 AND col2 = val2 AND col3 > val3 AND col3 < val4
And there could be additional scan keys, which shouldn't be skipped.
But that shouldn't mean we shouldn't skip others.

See the second patch for my next proposal to fix the problem. Instead
of relying on _bt_first(), let's rely on the first matched item on the
page. Once it's found, we may skip scan keys required for the opposite
direction. What do you think?

------
Regards,
Alexander Korotkov

In reply to: Alexander Korotkov (#18)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Will you be in Prague this week? If not this might have to wait.

#20Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#18)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Mon, Dec 11, 2023 at 5:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

BTW, do we really need to keep around the BTScanOpaqueData.firstPage
field? Why can't the call to _bt_readpage from _bt_first (and from
_bt_endpoint) just pass "firstPage=true" as a simple argument? Note
that the first call to _bt_readpage must take place from _bt_first (or
from _bt_endpoint). The first _bt_first call is already kind of
special, in a way that is directly related to this issue. I added some
comments about that to today's commit c9c0589fda, in fact -- I think
it's an important issue in general.

Please, check the attached patchset.

Sorry, I forgot the attachment. Here it is.

------
Regards,
Alexander Korotkov

Attachments:

0001-Remove-BTScanOpaqueData.firstPage-v2.patchapplication/octet-stream; name=0001-Remove-BTScanOpaqueData.firstPage-v2.patchDownload+8-15
0002-Fix-requiredOppositeDir-bug-v2.patchapplication/octet-stream; name=0002-Fix-requiredOppositeDir-bug-v2.patchDownload+12-10
#21Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Geoghegan (#19)
#22Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#21)
#23Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#22)
#24Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pavel Borisov (#23)
#25Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#24)
#26Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pavel Borisov (#25)
#27Anton A. Melnikov
a.melnikov@postgrespro.ru
In reply to: Alexander Korotkov (#26)
#28Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Anton A. Melnikov (#27)
#29Anton A. Melnikov
a.melnikov@postgrespro.ru
In reply to: Pavel Borisov (#28)
#30Maxim Orlov
orlovmg@gmail.com
In reply to: Anton A. Melnikov (#29)
#31Alexander Korotkov
aekorotkov@gmail.com
In reply to: Maxim Orlov (#30)