Index-only scans vs. partially-retrievable indexes

Started by Tom Laneover 4 years ago5 messageshackers

tgl@sss.pgh.pa.us

over 4 years ago

Yesterday I pushed what I thought was a quick fix for bug #17350 [1]/messages/by-id/17350-b5bdcf476e5badbb@postgresql.org.
In short, if we have an index that stores both "x" and "f(x)",
where the "x" column can be retrieved in index-only scans but "f(x)"
cannot, it's possible for the planner to generate an IOS plan that
nonetheless tries to read the f(x) index column. The bug report
concerns the case where f(x) is needed in the IOS plan node's targetlist,
and I did fix that --- but I now realize that we still have a problem
with respect to rechecks of the plan node's indexquals. Here's
an example:

regression=# create extension pg_trgm;
CREATE EXTENSION
regression=# create table t(a text);
CREATE TABLE
regression=# create index on t using gist(lower(a) gist_trgm_ops) include (a);
CREATE INDEX
regression=# insert into t values('zed');
INSERT 0 1
regression=# insert into t values('z');
INSERT 0 1
regression=# select * from t where lower(a) like 'z';
a
---
z
(1 row)

That's the correct answer, but we're using a bitmap scan to get it.
If we force an IOS plan:

regression=# set enable_bitmapscan = 0;
SET
regression=# explain select * from t where lower(a) like 'z';
QUERY PLAN
------------------------------------------------------------------------------
Index Only Scan using t_lower_a_idx on t (cost=0.14..28.27 rows=7 width=32)
Index Cond: ((lower(a)) ~~ 'z'::text)
(2 rows)

regression=# select * from t where lower(a) like 'z';
a
---
(0 rows)

That's from a build a few days old. As of HEAD it's even worse;
not only do we fail to return the rows we should, but EXPLAIN says

regression=# explain select * from t where lower(a) like 'z';
QUERY PLAN
------------------------------------------------------------------------------
Index Only Scan using t_lower_a_idx on t (cost=0.14..28.27 rows=7 width=32)
Index Cond: ((NULL::text) ~~ 'z'::text)
(2 rows)

At least this is showing us what's happening: the index recheck condition
sees a NULL for the value of lower(a). That's because it's trying to
get the value of lower(a) out of the index, instead of recomputing it
from the value of a.

AFAICS this has been broken since 9.5 allowed indexes to contain
both retrievable and non-retrievable columns, so it's a bit surprising
that it hasn't been reported before. I suppose that the case was
harder to hit before we introduced INCLUDE columns. The relevant
code actually claims that it's impossible:

/*
* If the index was lossy, we have to recheck the index quals.
* (Currently, this can never happen, but we should support the case
* for possible future use, eg with GiST indexes.)
*/
if (scandesc->xs_recheck)
{
econtext->ecxt_scantuple = slot;
if (!ExecQualAndReset(node->indexqual, econtext))
{
/* Fails recheck, so drop it and loop back for another */
InstrCountFiltered2(node, 1);
continue;
}
}

That comment may have been true when written (it dates to 9.2) but
it's demonstrably not true now; the test case I just gave traverses
this code, and gets the wrong answer.

I don't think there is any way to fix this that doesn't involve
adding another field to structs IndexOnlyScan and IndexOnlyScanState.
We need a version of the indexqual that references the retrievable
index column x and computes f(x) from that, but the indexqual that's
passed to the index AM still has to reference the f(x) index column.
That's annoying from an API stability standpoint. In the back
branches, we can add the new fields at the end to minimize ABI
breakage, but we will still be breaking any extension code that thinks
it knows how to generate an IndexOnlyScan node directly. (But maybe
there isn't any. The Path representation doesn't need to change, so
typical planner extensions should be OK.)

Unless somebody's got a better idea, I'll push forward with making
this happen.

regards, tom lane

[1]: /messages/by-id/17350-b5bdcf476e5badbb@postgresql.org

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Tom Lane (#1)

Re: Index-only scans vs. partially-retrievable indexes

I wrote:

Unless somebody's got a better idea, I'll push forward with making
this happen.

Here's a proposed patch for that. I ended up reverting the code
changes of 4ace45677 in favor of an alternative I'd considered
previously, which is to mark the indextlist elements as resjunk
if they're non-retrievable, and then make setrefs.c deal with
not relying on those elements. This avoids the EXPLAIN breakage
I showed, since now we still have the indextlist elements needed
to interpret the indexqual and indexorderby expressions.

0001 is what I propose to back-patch (modulo putting the new
IndexOnlyScan.recheckqual field at the end, in the back branches).

In addition, it seems to me that we can simplify check_index_only()
by reverting b5febc1d1's changes, because we've now been forced to
put in a non-half-baked solution for the problem it addressed.
That's 0002 attached. I'd be inclined not to back-patch that though.

regards, tom lane

Andrey Borodin

amborodin@acm.org

over 4 years ago

In reply to: Tom Lane (#1)

Re: Index-only scans vs. partially-retrievable indexes

regression=# explain select * from t where lower(a) like 'z';
QUERY PLAN
------------------------------------------------------------------------------
Index Only Scan using t_lower_a_idx on t (cost=0.14..28.27 rows=7 width=32)
Index Cond: ((lower(a)) ~~ 'z'::text)
(2 rows)

I've tried to toy with the patch and remembered one related caveat.
If we have index for both returnable and nonreturnable attributes, IOS will not be choosen:

postgres=# create index on t using gist(a gist_trgm_ops) include (a);
postgres=# explain select * from t where a like 'z';
QUERY PLAN
---------------------------------------------------------------------
Index Scan using t_a_a1_idx on t (cost=0.12..8.14 rows=1 width=32)
Index Cond: (a ~~ 'z'::text)
(2 rows)

But with index
create index on t using gist(lower(a) gist_trgm_ops) include (a);
I observe IOS for
select * from t where lower(a) like 'z';

So lossiness of opclass kind of "defeats" returnable attribute. But lossiness of expression does not. I don't feel condifent in surrounding code to say is it a bug or just a lack of feature. But maybe we would like to have equal behavior in both cases...

Thanks!

Best regards, Andrey Borodin.

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Andrey Borodin (#3)

Re: Index-only scans vs. partially-retrievable indexes

Andrey Borodin <x4mmm@yandex-team.ru> writes:

I've tried to toy with the patch and remembered one related caveat.
If we have index for both returnable and nonreturnable attributes, IOS will not be choosen:

postgres=# create index on t using gist(a gist_trgm_ops) include (a);
postgres=# explain select * from t where a like 'z';
QUERY PLAN
---------------------------------------------------------------------
Index Scan using t_a_a1_idx on t (cost=0.12..8.14 rows=1 width=32)
Index Cond: (a ~~ 'z'::text)
(2 rows)

This case is improved by 0002, no?

regards, tom lane

Andrey Borodin

amborodin@acm.org

over 4 years ago

In reply to: Tom Lane (#4)

Re: Index-only scans vs. partially-retrievable indexes

Andrey Borodin <x4mmm@yandex-team.ru> writes:

I've tried to toy with the patch and remembered one related caveat.
If we have index for both returnable and nonreturnable attributes, IOS will not be choosen:

postgres=# create index on t using gist(a gist_trgm_ops) include (a);
postgres=# explain select * from t where a like 'z';
QUERY PLAN
---------------------------------------------------------------------
Index Scan using t_a_a1_idx on t (cost=0.12..8.14 rows=1 width=32)
Index Cond: (a ~~ 'z'::text)
(2 rows)

This case is improved by 0002, no?

Uhmm, yes, you are right. Works as expected with the second patch.
I tried first patch against this before writing. But did not expect much from a revert...

Thanks you!

Best regards, Andrey Borodin.

Index-only scans vs. partially-retrievable indexes

Attachments: