Add a bound check to TidRangeEval

Started by Junwang Zhao7 months ago6 messages
#1Junwang Zhao
zhjwpku@gmail.com
1 attachment(s)

Hi hackers,

The comments of TidRangeEval saids:

```
Returns false if we detect the range cannot contain any tuples.
```

After narrowing the upper and lower bounds, we can add an
additional check to verify if the lower bound exceeds the
upper bound. If true, return false to avoid unnecessary calls
to table_beginscan_tidrange.

PFA the trivial patch.

--
Regards
Junwang Zhao

Attachments:

v1-0001-add-a-bound-check-to-TidRangeEval.patchapplication/octet-stream; name=v1-0001-add-a-bound-check-to-TidRangeEval.patchDownload
From c9b782d8c3ac5b8fc28c94313a6b64f3f9a4791a Mon Sep 17 00:00:00 2001
From: Junwang Zhao <zhjwpku@gmail.com>
Date: Sun, 8 Jun 2025 09:32:23 +0000
Subject: [PATCH v1] add a bound check to TidRangeEval

---
 src/backend/executor/nodeTidrangescan.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index ab2eab9596e..855189e22c0 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -202,6 +202,12 @@ TidRangeEval(TidRangeScanState *node)
 		}
 	}
 
+	if (ItemPointerCompare(&lowerBound, &upperBound) > 0)
+	{
+		/* The lower bound is greater than the upper bound, so no tuples match */
+		return false;
+	}
+
 	ItemPointerCopy(&lowerBound, &node->trss_mintid);
 	ItemPointerCopy(&upperBound, &node->trss_maxtid);
 
-- 
2.39.5

#2David Rowley
dgrowleyml@gmail.com
In reply to: Junwang Zhao (#1)
Re: Add a bound check to TidRangeEval

On Sun, 8 Jun 2025 at 21:41, Junwang Zhao <zhjwpku@gmail.com> wrote:

The comments of TidRangeEval saids:

```
Returns false if we detect the range cannot contain any tuples.
```

After narrowing the upper and lower bounds, we can add an
additional check to verify if the lower bound exceeds the
upper bound. If true, return false to avoid unnecessary calls
to table_beginscan_tidrange.

Do you think this is common enough to bother doing? It's not like
this is going on in the planner and allows further optimisations in
regards to row estimates to assist in join order decisions.

Also, if you're going to propose a performance optimisation, then it
should also come with some benchmark code and results to demonstrate
that there is a gain in performance. I don't think it should be up to
the reviewer or committer to do that for you. I imagine it's not
exciting enough to bother with, but feel free to prove that I'm wrong.

David

#3Junwang Zhao
zhjwpku@gmail.com
In reply to: David Rowley (#2)
Re: Add a bound check to TidRangeEval

Hi David,

On Mon, Jun 9, 2025 at 9:52 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, 8 Jun 2025 at 21:41, Junwang Zhao <zhjwpku@gmail.com> wrote:

The comments of TidRangeEval saids:

```
Returns false if we detect the range cannot contain any tuples.
```

After narrowing the upper and lower bounds, we can add an
additional check to verify if the lower bound exceeds the
upper bound. If true, return false to avoid unnecessary calls
to table_beginscan_tidrange.

Do you think this is common enough to bother doing? It's not like
this is going on in the planner and allows further optimisations in
regards to row estimates to assist in join order decisions.

This is not a common case, it's just a corner case while
playing around the TidRangeScan.

I'm not saying this is an optimization, it just makes me a little
confused when I see the lowerBound > upperBound and
it still returns true.

The comments say "Returns true if it's possible for the
range to contain tuples." This seems not fully accurate?

The case will be handled later while scanning the pages,
but why not handle it earlier if we already know there is no
chance of the range containing any tuples.

Also, if you're going to propose a performance optimisation, then it
should also come with some benchmark code and results to demonstrate
that there is a gain in performance. I don't think it should be up to
the reviewer or committer to do that for you. I imagine it's not
exciting enough to bother with, but feel free to prove that I'm wrong.

This is not a performance optimization, so I never try to come
up with any benchmark. Sorry if I made you feel that way.

David

--
Regards
Junwang Zhao

#4David Rowley
dgrowleyml@gmail.com
In reply to: Junwang Zhao (#3)
1 attachment(s)
Re: Add a bound check to TidRangeEval

On Wed, 11 Jun 2025 at 14:26, Junwang Zhao <zhjwpku@gmail.com> wrote:

This is not a common case, it's just a corner case while
playing around the TidRangeScan.

I'm not saying this is an optimization, it just makes me a little
confused when I see the lowerBound > upperBound and
it still returns true.

The comments say "Returns true if it's possible for the
range to contain tuples." This seems not fully accurate?

The word "possible" there is meant to convey that false negatives are
*not* ok, but false positives are fine. I'm certainly open to making
the comment better if that's not clear.

How about adding "We don't bother validating that trss_mintid is less
than or equal to trss_maxtid, as the scan_set_tidrange() table AM
function will detect that."

The scan_set_tidrange function does document that it's the
implementation's responsibility to check this, as the comment in the
declaration of struct TableAmRoutine reads:

* Implementations of scan_set_tidrange must themselves handle
* ItemPointers of any value. i.e, they must handle each of the following:
*
* 1) mintid or maxtid is beyond the end of the table; and
* 2) mintid is above maxtid; and
* 3) item offset for mintid or maxtid is beyond the maximum offset
* allowed by the AM.

So I'm not keen to add code to TidRangeEval to check for this as it
would result in additional overhead for valid usages of TID Range
scans.

David

Attachments:

TidRangeEval_comment.patchapplication/octet-stream; name=TidRangeEval_comment.patchDownload
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index ab2eab9596e..471c79689c9 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -128,9 +128,11 @@ TidExprListCreate(TidRangeScanState *tidrangestate)
  *		TidRangeEval
  *
  *		Compute and set node's block and offset range to scan by evaluating
- *		the trss_tidexprs.  Returns false if we detect the range cannot
+ *		node->trss_tidexprs.  Returns false if we detect the range cannot
  *		contain any tuples.  Returns true if it's possible for the range to
- *		contain tuples.
+ *		contain tuples.  We don't bother validating that trss_mintid is less
+ *		than or equal to trss_maxtid, as the scan_set_tidrange() table AM
+ *		function will detect that.
  * ----------------------------------------------------------------
  */
 static bool
#5Junwang Zhao
zhjwpku@gmail.com
In reply to: David Rowley (#4)
Re: Add a bound check to TidRangeEval

Hi David,

On Sat, Jun 14, 2025 at 11:27 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 11 Jun 2025 at 14:26, Junwang Zhao <zhjwpku@gmail.com> wrote:

This is not a common case, it's just a corner case while
playing around the TidRangeScan.

I'm not saying this is an optimization, it just makes me a little
confused when I see the lowerBound > upperBound and
it still returns true.

The comments say "Returns true if it's possible for the
range to contain tuples." This seems not fully accurate?

The word "possible" there is meant to convey that false negatives are
*not* ok, but false positives are fine. I'm certainly open to making
the comment better if that's not clear.

How about adding "We don't bother validating that trss_mintid is less
than or equal to trss_maxtid, as the scan_set_tidrange() table AM
function will detect that."

Sounds good to me. This will naturally direct readers to the
comments in scan_set_tidrange.

The scan_set_tidrange function does document that it's the
implementation's responsibility to check this, as the comment in the
declaration of struct TableAmRoutine reads:

* Implementations of scan_set_tidrange must themselves handle
* ItemPointers of any value. i.e, they must handle each of the following:
*
* 1) mintid or maxtid is beyond the end of the table; and
* 2) mintid is above maxtid; and
* 3) item offset for mintid or maxtid is beyond the maximum offset
* allowed by the AM.

So I'm not keen to add code to TidRangeEval to check for this as it
would result in additional overhead for valid usages of TID Range
scans.

Agree, thanks for the explanation.

David

--
Regards
Junwang Zhao

#6David Rowley
dgrowleyml@gmail.com
In reply to: Junwang Zhao (#5)
Re: Add a bound check to TidRangeEval

On Sat, 14 Jun 2025 at 16:10, Junwang Zhao <zhjwpku@gmail.com> wrote:

On Sat, Jun 14, 2025 at 11:27 AM David Rowley <dgrowleyml@gmail.com> wrote:

How about adding "We don't bother validating that trss_mintid is less
than or equal to trss_maxtid, as the scan_set_tidrange() table AM
function will detect that."

Sounds good to me. This will naturally direct readers to the
comments in scan_set_tidrange.

Thanks for looking. Pushed, after changing "detect" into "handle".

David