Partial hash index is not used for implied qual.
Hi!
Partial hash index is not used if qual is an implied qual
since this qual is not added to indrestrictinfo and we cannot
get the keys needed to make hash index scan possible.
Suggested fix is to add implied qual for the indexes
which requires the presence of a key to scan the index.
How to repeat:
CREATE TABLE hash_partial(x) AS
SELECT x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
EXPLAIN (COSTS OFF) SELECT x FROM hash_partial WHERE x = 1;
...
QUERY PLAN
--------------------------
Seq Scan on hash_partial
Filter: (x = 1)
...
Regards,
Sergei Glukhov
Attachments:
partial_hash_index_with_implied_qual.patchtext/x-patch; charset=UTF-8; name=partial_hash_index_with_implied_qual.patchDownload
commit 89e8f5db08b7cf045b743b7d52b0b3754afff405
Author: Sergei Glukhov <s.glukhov@postgrespro.ru>
Date: Mon Nov 24 11:12:46 2025 +0400
Partial hash index is not used for implied quals.
Partial hash index is not used if qual is an implied qual
since this qual is not added to indrestrictinfo and we cannot
get the keys needed to make hash index scan possible.
Suggested fix is to add implied qual to indrestrictinfo
for the indexes that require a key to scan the index.
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c62e3f87724..0445bbf8775 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4059,8 +4059,10 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
/* predicate_implied_by() assumes first arg is immutable */
if (contain_mutable_functions((Node *) rinfo->clause) ||
- !predicate_implied_by(list_make1(rinfo->clause),
- index->indpred, false))
+ (index->amoptionalkey &&
+ !predicate_implied_by(list_make1(rinfo->clause),
+ index->indpred, false)) ||
+ !index->amoptionalkey)
index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);
}
}
diff --git a/src/test/regress/expected/hash_index.out b/src/test/regress/expected/hash_index.out
index 0d4bdb2adef..5213742b199 100644
--- a/src/test/regress/expected/hash_index.out
+++ b/src/test/regress/expected/hash_index.out
@@ -333,3 +333,22 @@ CREATE INDEX hash_f8_index2 ON hash_f8_heap USING hash (random float8_ops)
WITH (fillfactor=101);
ERROR: value 101 out of bounds for option "fillfactor"
DETAIL: Valid values are between "10" and "100".
+-- Partial hash index with an implied qual.
+CREATE TABLE hash_partial(x) AS
+ SELECT x as y from generate_series(1, 1000) as x;
+ANALYZE hash_partial;
+CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
+EXPLAIN (COSTS OFF) SELECT x FROM hash_partial WHERE x = 1;
+ QUERY PLAN
+----------------------------------------------
+ Index Scan using partial_idx on hash_partial
+ Index Cond: (x = 1)
+(2 rows)
+
+SELECT x FROM hash_partial WHERE x = 1;
+ x
+---
+ 1
+(1 row)
+
+DROP TABLE hash_partial;
diff --git a/src/test/regress/sql/hash_index.sql b/src/test/regress/sql/hash_index.sql
index 219da829816..f4f1d195219 100644
--- a/src/test/regress/sql/hash_index.sql
+++ b/src/test/regress/sql/hash_index.sql
@@ -321,3 +321,13 @@ CREATE INDEX hash_f8_index2 ON hash_f8_heap USING hash (random float8_ops)
WITH (fillfactor=9);
CREATE INDEX hash_f8_index2 ON hash_f8_heap USING hash (random float8_ops)
WITH (fillfactor=101);
+
+-- Partial hash index with an implied qual.
+CREATE TABLE hash_partial(x) AS
+ SELECT x as y from generate_series(1, 1000) as x;
+ANALYZE hash_partial;
+CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
+EXPLAIN (COSTS OFF) SELECT x FROM hash_partial WHERE x = 1;
+SELECT x FROM hash_partial WHERE x = 1;
+
+DROP TABLE hash_partial;
On Mon, 24 Nov 2025 at 20:33, Sergei Glukhov <s.glukhov@postgrespro.ru> wrote:
Partial hash index is not used if qual is an implied qual
since this qual is not added to indrestrictinfo and we cannot
get the keys needed to make hash index scan possible.
Suggested fix is to add implied qual for the indexes
which requires the presence of a key to scan the index.How to repeat:
That's not very good. I see we abort building the index path at:
/*
* If no clauses match the first index column, check for amoptionalkey
* restriction. We can't generate a scan over an index with
* amoptionalkey = false unless there's at least one index clause.
* (When working on columns after the first, this test cannot fail. It
* is always okay for columns after the first to not have any
* clauses.)
*/
if (index_clauses == NIL && !index->amoptionalkey)
return NIL;
and that's there due to the fact that Hash doesn't support full
clauseless scans. Without the above, you'd get:
SELECT x FROM hash_partial WHERE x = 1;
ERROR: hash indexes do not support whole-index scans
so, that leads me to believe the location you're adjusting is probably
the best place to fix this issue.
As for the patch, you didn't update the comment to include the reason
you're keeping the restrictinfo clause. That's not great. I think you
should break that new test out into a new "if" test, maybe like:
/*
* Keep the restrictinfo for non-amoptionalkey index types as
* dropping the clause could result in having no clauses to use to
* scan the index. That's unsupported by non-amoptionalkey
* indexes, so if we dropped this qual, we'd fail to build a Path
* for this index later in planning.
*/
if (!index->amoptionalkey)
index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);
/* predicate_implied_by() assumes first arg is immutable */
else if (contain_mutable_functions((Node *) rinfo->clause) ||
!predicate_implied_by(list_make1(rinfo->clause),
index->indpred, false))
index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);
David
David Rowley <dgrowleyml@gmail.com> writes:
so, that leads me to believe the location you're adjusting is probably
the best place to fix this issue.
Wouldn't it be better to handle it more like the is_target_rel logic
a few lines further up? I also object to putting the test between
the contain_mutable_functions and predicate_implied_by calls; that's
both confusing and probably wrong. We're only calling
contain_mutable_functions to guard an assumption that
predicate_implied_by makes.
A larger point is that I think leaving such quals in indrestrictinfo
probably distorts our estimates of indexscan costs: we are likely to
think they contribute selectivity when they don't. Maybe that's a
problem to address separately, but it should be looked at. We skated
past the same problem for is_target_rel cases on the grounds that that
consideration affects all indexes on the rel equally; but as proposed,
this will probably result in an improper bias towards a partial hash
index.
regards, tom lane
I wrote:
Wouldn't it be better to handle it more like the is_target_rel logic
a few lines further up?
Actually, after thinking a bit longer, it'd be better to do something
like the attached so that we don't keep redundant quals unless they'd
*all* be excluded.
There's definitely something fishy about the costing though.
I experimented with this variant of Sergei's example:
regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
SELECT 1000
regression=# ANALYZE hash_partial;
ANALYZE
regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
CREATE INDEX
regression=# set enable_seqscan TO 0; -- else we'll go for a seqscan
SET
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on hash_partial (cost=24.08..32.56 rows=10 width=4)
Recheck Cond: (x = 1)
-> Bitmap Index Scan on partial_idx (cost=0.00..24.07 rows=10 width=0)
Index Cond: (x = 1)
(4 rows)
regression=# drop index partial_idx;
DROP INDEX
regression=# CREATE INDEX ON hash_partial USING hash(x);
CREATE INDEX
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
QUERY PLAN
----------------------------------------------------------------------------------
Bitmap Heap Scan on hash_partial (cost=4.08..12.56 rows=10 width=4)
Recheck Cond: (x = 1)
-> Bitmap Index Scan on hash_partial_x_idx (cost=0.00..4.08 rows=10 width=0)
Index Cond: (x = 1)
(4 rows)
Why are we thinking that a non-partial index would be substantially
cheaper to scan? That seems surely wrong, and it runs counter to my
intuition about why this fix is incomplete. (I expected an unfair
bias towards the partial index, not against it.)
regards, tom lane
Attachments:
wip-fix-partial-hash-index-scan.patchtext/x-diff; charset=us-ascii; name=wip-fix-partial-hash-index-scan.patchDownload
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c62e3f87724..c2687dec425 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4038,6 +4038,7 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
foreach(lc, rel->indexlist)
{
IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+ List *newrestrictinfo;
ListCell *lcr;
if (index->indpred == NIL)
@@ -4051,8 +4052,8 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
if (is_target_rel)
continue;
- /* Else compute indrestrictinfo as the non-implied quals */
- index->indrestrictinfo = NIL;
+ /* Else compute new indrestrictinfo as the non-implied quals */
+ newrestrictinfo = NIL;
foreach(lcr, rel->baserestrictinfo)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr);
@@ -4061,8 +4062,18 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
if (contain_mutable_functions((Node *) rinfo->clause) ||
!predicate_implied_by(list_make1(rinfo->clause),
index->indpred, false))
- index->indrestrictinfo = lappend(index->indrestrictinfo, rinfo);
+ newrestrictinfo = lappend(newrestrictinfo, rinfo);
}
+
+ /*
+ * If we excluded every qual as implied by the predicate, and the
+ * index cannot do full-index scans, then it's better to leave
+ * indrestrictinfo alone so that we can still build a scan on this
+ * index. XXX We will underestimate the cost of such an indexscan;
+ * what can we do about that?
+ */
+ if (!(newrestrictinfo == NIL && !index->amoptionalkey))
+ index->indrestrictinfo = newrestrictinfo;
}
}
Hi,
On 11/25/25 6:01 AM, Tom Lane wrote:
I wrote:
Wouldn't it be better to handle it more like the is_target_rel logic
a few lines further up?Actually, after thinking a bit longer, it'd be better to do something
like the attached so that we don't keep redundant quals unless they'd
*all* be excluded.There's definitely something fishy about the costing though.
I experimented with this variant of Sergei's example:regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
SELECT 1000
regression=# ANALYZE hash_partial;
ANALYZE
regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
CREATE INDEX
regression=# set enable_seqscan TO 0; -- else we'll go for a seqscan
SET
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on hash_partial (cost=24.08..32.56 rows=10 width=4)
Recheck Cond: (x = 1)
-> Bitmap Index Scan on partial_idx (cost=0.00..24.07 rows=10 width=0)
Index Cond: (x = 1)
(4 rows)regression=# drop index partial_idx;
DROP INDEX
regression=# CREATE INDEX ON hash_partial USING hash(x);
CREATE INDEX
regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
QUERY PLAN
----------------------------------------------------------------------------------
Bitmap Heap Scan on hash_partial (cost=4.08..12.56 rows=10 width=4)
Recheck Cond: (x = 1)
-> Bitmap Index Scan on hash_partial_x_idx (cost=0.00..4.08 rows=10 width=0)
Index Cond: (x = 1)
(4 rows)Why are we thinking that a non-partial index would be substantially
cheaper to scan? That seems surely wrong, and it runs counter to my
intuition about why this fix is incomplete. (I expected an unfair
bias towards the partial index, not against it.)regards, tom lane
Thanks for the fix. It seems there is another case for investigation:
DROP TABLE hash_partial;
CREATE TABLE hash_partial(x, y) AS
SELECT x, x + x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
SET enable_seqscan TO 0;
EXPLAIN SELECT x FROM hash_partial WHERE x = 1 and y < 0;
--------------------------------------------------------------------------------
Seq Scan on hash_partial (cost=0.00..23.00 rows=1 width=4)
Disabled: true
Filter: ((y < 0) AND (x = 1))
(3 rows)
Regarding strangeness of the cost,
cost is depends on numIndexPages and
in genericcostestimate() we calulate numIndexPages:
numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);
For non-partial index index->pages = 6 and index->tuples = 1000
and for partial index index->pages = 6 and index->tuples = 10.
Number of pages depends on index relation size and
initial size is 6 * BLCKSZ for both, partial and non-partial hash indexes
Initial size of the hash index relation, in turn,
depends on total number of tuples in the table.
Regards,
Sergei Glukhov
On Tue, 25 Nov 2025 at 15:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Actually, after thinking a bit longer, it'd be better to do something
like the attached so that we don't keep redundant quals unless they'd
*all* be excluded.
I think your 1st patch was more along the correct lines. With the 2nd
one, I think you're maybe assuming that the non-empty newrestrictinfo
must contain an indexable qual, but the list we're working with at
that point is the rel's baserestrictinfo, which could contain a bunch
of stuff that'll never match to the index. If you continue to remove
the implied qual when there's a non-indexable base qual in the list,
then we'll still have the same issue that Sergei is trying to fix.
Just try adding an unrelated qual with your 2nd patch. Something like:
select * from hash_partial where x=1 and ctid >= '(0,0)';
I think you might have tried the 2nd method because you didn't see the
point in adding >1 indexable qual to scan the index with when we just
want ==1. If you wanted to do that, then maybe match_clause_to_index()
would be the place to ensure the list_length() doesn't go above 1 for
non-amoptionalkey indexes. Is that really worthwhile? hash indexes
only support equality anyway, so in what scenario would there be more
than 1 qual?
David
David Rowley <dgrowleyml@gmail.com> writes:
On Tue, 25 Nov 2025 at 15:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Actually, after thinking a bit longer, it'd be better to do something
like the attached so that we don't keep redundant quals unless they'd
*all* be excluded.
I think your 1st patch was more along the correct lines. With the 2nd
one, I think you're maybe assuming that the non-empty newrestrictinfo
must contain an indexable qual, but the list we're working with at
that point is the rel's baserestrictinfo, which could contain a bunch
of stuff that'll never match to the index. If you continue to remove
the implied qual when there's a non-indexable base qual in the list,
then we'll still have the same issue that Sergei is trying to fix.
Right, as Sergei also noted. We should just do it as attached.
I checked the costing calculations and it's basically that
genericcostestimate doesn't understand about hash buckets.
For the partial index, it correctly estimates that we'll visit
all 10 of the tuples in the index, so it supposes that that
means we'll visit all of the index's pages. For the non-partial
index, it correctly estimates that we'll visit 10 of the 1000
tuples in the index, so it supposes that that means we'll visit
1/100th of the index's pages (rounded up to 1). This error is
compounded by the toy example, which only has 6 pages in either index
(the minimum size of a hash index, I think). There may or may not be
anything we can usefully do to improve that situation ... and for that
matter, it's not clear that preferring the partial index would really
be a win. As constructed, this test case has only one hash value in
the partial index, which I think is not exactly a case where you want
a hash index. I tried scaling up the table size, and got a badly
bloated partial index (about half as big as the non-partial one),
which seems to indicate that the code is vainly splitting and
re-splitting trying to divide that one huge bucket.
So I'm inclined to apply the attached and just call it good.
Should we back-patch? I'm unsure. Clearly it's a bug that we
cannot generate an indexscan plan in this case, but we've learned
that changing plans in released branches is often not wanted.
And given the lack of field complaints, nobody is using the case
anyway.
regards, tom lane
Attachments:
v2-fix-partial-hash-index-scan.patchtext/x-diff; charset=us-ascii; name=v2-fix-partial-hash-index-scan.patchDownload
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c62e3f87724..2654c59c4c6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4051,6 +4051,16 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
if (is_target_rel)
continue;
+ /*
+ * If index is !amoptionalkey, also leave indrestrictinfo as set
+ * above. Otherwise we risk removing all quals for the first index
+ * key and then not being able to generate an indexscan at all. It
+ * would be better to be more selective, but we've not yet identified
+ * which if any of the quals match the first index key.
+ */
+ if (!index->amoptionalkey)
+ continue;
+
/* Else compute indrestrictinfo as the non-implied quals */
index->indrestrictinfo = NIL;
foreach(lcr, rel->baserestrictinfo)
On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I checked the costing calculations and it's basically that
genericcostestimate doesn't understand about hash buckets.
For the partial index, it correctly estimates that we'll visit
all 10 of the tuples in the index, so it supposes that that
means we'll visit all of the index's pages. For the non-partial
index, it correctly estimates that we'll visit 10 of the 1000
I assume you must mean using your "x % 100" case rather than Sergei's case.
tuples in the index, so it supposes that that means we'll visit
1/100th of the index's pages (rounded up to 1). This error is
compounded by the toy example, which only has 6 pages in either index
(the minimum size of a hash index, I think). There may or may not be
anything we can usefully do to improve that situation
I'm not so clear on why this is bad. The index has 1000 tuples on 6
pages and we want to scan 1% of the rows in the index. As a result,
genericcostestimate() calculates that's 1 page. How many pages would
you expect to scan?
... and for that
matter, it's not clear that preferring the partial index would really
be a win. As constructed, this test case has only one hash value in
the partial index, which I think is not exactly a case where you want
a hash index. I tried scaling up the table size, and got a badly
bloated partial index (about half as big as the non-partial one),
which seems to indicate that the code is vainly splitting and
re-splitting trying to divide that one huge bucket.
I assumed this was some attempt at finding a cheap way to find rows
matching the index's predicate without having to have an index on all
rows.
So I'm inclined to apply the attached and just call it good.
I think the patch looks fine.
Should we back-patch? I'm unsure. Clearly it's a bug that we
cannot generate an indexscan plan in this case, but we've learned
that changing plans in released branches is often not wanted.
And given the lack of field complaints, nobody is using the case
anyway.
I feel like anyone adding a partial hash index has done so quite
purposefully. I suspect they might be surprised if there's no means
whatsoever to use that index in scans, so perhaps it's ok to
backpatch.
Sergei, can you confirm if this was something he noticed when playing
around on master, or if this came from a field report?
David
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I checked the costing calculations and it's basically that
genericcostestimate doesn't understand about hash buckets.
I assume you must mean using your "x % 100" case rather than Sergei's case.
Right.
... tuples in the index, so it supposes that that means we'll visit
1/100th of the index's pages (rounded up to 1).
I'm not so clear on why this is bad. The index has 1000 tuples on 6
pages and we want to scan 1% of the rows in the index. As a result,
genericcostestimate() calculates that's 1 page. How many pages would
you expect to scan?
To be clear, neither of genericcostestimate's estimates are bad as
a generic estimate. And I'm not even sure that we could do better
with a hash-aware estimate implemented in hashcostestimate. I think
the key thing about this test case is that the partial index ends
up with all its entries in one hash bucket. I'm not sure that we
could reasonably expect to know that in the cost estimator. Even
if we could, should we really expend a lot of effort on the case?
It's a textbook example of when you should not use a hash index.
I'm interested in fixing this can't-generate-this-plan-shape bug
because it probably impacts more-realistic use-cases. But I
think the cost estimation problem is probably specific to cases
where you shouldn't have used a hash index, so I'm okay with
ignoring that part. Until more evidence arrives, anyway.
regards, tom lane
Hi,
On 11/27/25 7:01 AM, David Rowley wrote:
On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I'm inclined to apply the attached and just call it good.
I think the patch looks fine.
+1, verified, thanks a lot!
Should we back-patch? I'm unsure. Clearly it's a bug that we
cannot generate an indexscan plan in this case, but we've learned
that changing plans in released branches is often not wanted.
And given the lack of field complaints, nobody is using the case
anyway.I feel like anyone adding a partial hash index has done so quite
purposefully. I suspect they might be surprised if there's no means
whatsoever to use that index in scans, so perhaps it's ok to
backpatch.Sergei, can you confirm if this was something he noticed when playing
around on master, or if this came from a field report?
It was reported for v16.
Regards,
Sergei Glukhov
Sergei Glukhov <s.glukhov@postgrespro.ru> writes:
On 11/27/25 7:01 AM, David Rowley wrote:
On Thu, 27 Nov 2025 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Should we back-patch? I'm unsure. Clearly it's a bug that we
cannot generate an indexscan plan in this case, but we've learned
that changing plans in released branches is often not wanted.
Sergei, can you confirm if this was something he noticed when playing
around on master, or if this came from a field report?
It was reported for v16.
OK, I'll backpatch then.
regards, tom lane