PATCH: index-only scans with partial indexes

Started by Tomas Vondraalmost 11 years ago68 messageshackers

tomas.vondra@2ndquadrant.com

almost 11 years ago

Hi,

currently partial indexes end up not using index only scans in most
cases, because check_index_only() is overly conservative, as explained
in this comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index predicate to
the index, the planner will decide index only scans are not possible.
Which is a bit unfortunate, because those columns are not needed at
runtime, and will only increase the index size (and the main benefit of
partial indexes is size reduction).

The attached patch fixes this by only considering clauses that are not
implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=39.44..39.45 rows=1 width=4)
(actual time=8.350..8.354 rows=1 loops=1)
-> Index Scan using tidx_partial on t
(cost=0.28..37.98 rows=585 width=4)
(actual time=0.034..4.368 rows=999 loops=1)
Planning time: 0.197 ms
Execution time: 8.441 ms
(4 rows)

explain analyze select count(b) from t where a > 1000 and a < 2000;

QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=33.44..33.45 rows=1 width=4)
(actual time=8.019..8.023 rows=1 loops=1)
-> Index Only Scan using tidx_partial on t
(cost=0.28..31.98 rows=585 width=4)
(actual time=0.036..4.165 rows=999 loops=1)
Heap Fetches: 0
Planning time: 0.188 ms
Execution time: 8.106 ms
(5 rows)

I've done a bunch of tests, and I do see small (hardly noticeable)
increase in planning time with long list of WHERE clauses, because all
those need to be checked against the index predicate. Not sure if this
is what's meant by 'quite expensive' in the comment. Moreover, this was
more than compensated by the IOS benefits (even with everything in RAM).

But maybe it's possible to fix that somehow? For example, we're
certainly doing those checks elsewhere when deciding which clauses need
to be evaluated at run-time, so maybe we could cache that somehow?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Tomas Vondra (#1)

Re: PATCH: index-only scans with partial indexes

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

currently partial indexes end up not using index only scans in most
cases, because check_index_only() is overly conservative, as explained
in this comment:
...

I've done a bunch of tests, and I do see small (hardly noticeable)
increase in planning time with long list of WHERE clauses, because all
those need to be checked against the index predicate. Not sure if this
is what's meant by 'quite expensive' in the comment. Moreover, this was
more than compensated by the IOS benefits (even with everything in RAM).

But maybe it's possible to fix that somehow? For example, we're
certainly doing those checks elsewhere when deciding which clauses need
to be evaluated at run-time, so maybe we could cache that somehow?

The key problem here is that you're doing those proofs vastly earlier than
before, for indexes that might not get used at all in the final plan.
If you do some tests with multiple partial indexes you will probably see
a bigger planning-time penalty.

Perhaps we should bite the bullet and do it anyway, but I'm pretty
suspicious of any claim that the planning cost is minimal.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Tom Lane (#2)

Re: PATCH: index-only scans with partial indexes

Hi,

On 07/10/2015 10:43 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

currently partial indexes end up not using index only scans in most
cases, because check_index_only() is overly conservative, as explained
in this comment:
...

I've done a bunch of tests, and I do see small (hardly noticeable)
increase in planning time with long list of WHERE clauses, because all
those need to be checked against the index predicate. Not sure if this
is what's meant by 'quite expensive' in the comment. Moreover, this was
more than compensated by the IOS benefits (even with everything in RAM).

But maybe it's possible to fix that somehow? For example, we're
certainly doing those checks elsewhere when deciding which clauses need
to be evaluated at run-time, so maybe we could cache that somehow?

The key problem here is that you're doing those proofs vastly earlier
than before, for indexes that might not get used at all in the final
plan. If you do some tests with multiple partial indexes you will
probably see a bigger planning-time penalty.

Hmmm. Maybe we could get a bit smarter by looking at the attnums of each
clause before doing the expensive stuff (which is predicate_implied_by I
believe), exploiting a few simple observations:

* if the clause is already covered by attrs_used, we don't need to
process it at all

* if the clause uses attributes not included in the index predicate,
we know it can't be implied

Of course, those are local optimizations, and can't fix some of the
problems (e.g. a lot of partial indexes).

Perhaps we should bite the bullet and do it anyway, but I'm pretty
suspicious of any claim that the planning cost is minimal.

Perhaps - I'm not claiming the planning cost is minimal. It was in the
tests I've done, but no doubt it's possible to construct examples where
the planning time will get much worse. With 30 partial indexes, I got an
increase from 0.01 ms to ~2.5ms on simple queries.

But maybe we could get at least some of the benefits by planning the
index scans like today, and then do the IOS check later? Of course, this
won't help with cases where the index scan is thrown away while the
index only scan would win, but it does help with cases where we end up
doing index scan anyway?

That's essentially what I'm struggling right now - I do have a 3TB data
set, the plan looks like this:

QUERY PLAN
------------------------------------------------------------------------
Sort (cost=1003860164.92..1003860164.92 rows=1 width=16)
Sort Key: orders.o_orderpriority
-> HashAggregate
Group Key: orders.o_orderpriority
-> Merge Semi Join
Merge Cond:
-> Index Scan using pk_orders on orders
Filter: ((o_orderdate >= '1997-07-01'::date) AND
(o_orderdate < '1997-10-01 00:00:00'::timestamp))
-> Index Scan using lineitem_l_orderkey_idx_part1 on
lineitem

and the visibility checks from Index Scans are killing the I/O. An IOS
is likely to perform much better here (but haven't ran the query yet).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#1)

Re: PATCH: index-only scans with partial indexes

On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com

wrote:

Hi,

currently partial indexes end up not using index only scans in most cases,
because check_index_only() is overly conservative, as explained in this
comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index predicate to the
index, the planner will decide index only scans are not possible. Which is
a bit unfortunate, because those columns are not needed at runtime, and
will only increase the index size (and the main benefit of partial indexes
is size reduction).

The attached patch fixes this by only considering clauses that are not
implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

However, "explain analyze select sum(b) from t where a > 1000 and a <
1999;" still doesn't use the index only
scan. Isn't that also implied by the predicate?

Cheers,

Jeff

Anastasia Lubennikova

a.lubennikova@postgrespro.ru

almost 11 years ago

In reply to: Jeff Janes (#4)

Re: PATCH: index-only scans with partial indexes

25.08.2015 20:19, Jeff Janes пишет:

On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
wrote:

Hi,

currently partial indexes end up not using index only scans in
most cases, because check_index_only() is overly conservative, as
explained in this comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required
even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index
predicate to the index, the planner will decide index only scans
are not possible. Which is a bit unfortunate, because those
columns are not needed at runtime, and will only increase the
index size (and the main benefit of partial indexes is size
reduction).

The attached patch fixes this by only considering clauses that are
not implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

However, "explain analyze select sum(b) from t where a > 1000 and a <
1999;" still doesn't use the index only
scan. Isn't that also implied by the predicate?

In this example it doesn't use IndexOnlyScan correctly. If I understand
partial indexes right, if index predicate and search clause are not
equal, index scan must recheck values when it's fetching them.
'tidx_partial' in example above has no information about 'a' attribute,
beside the index->indpred, so it is impossible to recheck qual without
referencing to table.

In example:
create index tidx_partial on t(a) where a > 1000 and a < 2000;
explain analyze select sum(a) from t where a > 1000 and a < 1999;
it can use IndexOnlyScan.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Anastasia Lubennikova (#5)

Re: PATCH: index-only scans with partial indexes

On Fri, Sep 4, 2015 at 4:28 AM, Anastasia Lubennikova <
a.lubennikova@postgrespro.ru> wrote:

25.08.2015 20:19, Jeff Janes пишет:

On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <
tomas.vondra@2ndquadrant.com> wrote:

Hi,

currently partial indexes end up not using index only scans in most
cases, because check_index_only() is overly conservative, as explained in
this comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index predicate to
the index, the planner will decide index only scans are not possible. Which
is a bit unfortunate, because those columns are not needed at runtime, and
will only increase the index size (and the main benefit of partial indexes
is size reduction).

The attached patch fixes this by only considering clauses that are not
implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

However, "explain analyze select sum(b) from t where a > 1000 and a <
1999;" still doesn't use the index only
scan. Isn't that also implied by the predicate?

In this example it doesn't use IndexOnlyScan correctly. If I understand
partial indexes right, if index predicate and search clause are not equal,
index scan must recheck values when it's fetching them.
'tidx_partial' in example above has no information about 'a' attribute,
beside the index->indpred, so it is impossible to recheck qual without
referencing to table.

In example:
create index tidx_partial on t(a) where a > 1000 and a < 2000;
explain analyze select sum(a) from t where a > 1000 and a < 1999;
it can use IndexOnlyScan.

Yes, of course. Thanks for the explanation, it is obvious now that you
have explained it. I kept slipping into thinking that the
predicate-dependent variables are included in the index but only when the
predicate is met, but that isn't the case.

How can we evaluate Tom's performance concerns? I tried
turning log_planner_stats on and using the regression test as a load
generator, but I don't think that that is very demanding of a test.

Thanks,

Jeff

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Jeff Janes (#6)

Re: PATCH: index-only scans with partial indexes

Hi,

On 09/04/2015 06:10 PM, Jeff Janes wrote:

How can we evaluate Tom's performance concerns? I tried
turning log_planner_stats on and using the regression test as a load
generator, but I don't think that that is very demanding of a test.

I've done a bit of benchmarking today, trying to measure how expensive
the additional checks are.

Using a simple table with just 4 columns and 1M rows

CREATE TABLE t AS SELECT i AS a, i AS b, i AS c, i AS d
FROM generate_series(1,1000000) s(i);

with three different index sets:

- no indexes
- 40 regular indexes (indexes-1.sql)
- 40 partial indexes (indexes-2.sql)

and two different query sets:

- matching the partial indexes (queries-1.sql)
- not matching the partial indexes (queries-2.sql)

which means 6 combinations:

A: no indexes / queries-1
B: no indexes / queries-2
C: indexes-1 / queries-1
D: indexes-1 / queries-2
E: indexes-2 / queries-1
F: indexes-2 / queries-2

A summary of 100 EXPLAIN timings looks like this:

master A B C D E F
-------------------------------------------------------------------------
min 0.10 0.10 0.30 0.29 0.66 0.23
max 1.07 1.00 2.13 1.98 4.52 1.59
median 0.49 0.52 0.31 0.33 0.68 1.12
average 0.43 0.35 0.62 0.49 1.01 0.89

patched A B C D E F
-------------------------------------------------------------------------
min 0.11 0.11 0.29 0.29 0.70 0.22
max 0.99 1.05 0.55 1.93 3.79 1.12
median 0.19 0.55 0.32 0.34 0.74 0.24
average 0.42 0.52 0.34 0.55 0.95 0.27

A-D should be exactly the same, because there are no partial indexes,
and the results match that expectation.

E and F should be different, depending on how expensive the additional
checks are. But in this benchmark that's not true - the patched version
is actually a bit faster, thanks to noise.

I find that a bit strange, but I repeated the benchmark about 3x just to
verify it really behaves like this. Maybe I did some stupid mistake and
the results are useless, or maybe it needs to be more complex (e.g. the
conditions must not be exactly the same). So if someone could rerun the
benchmark and review it, that'd be nice.

Judging the cost/benefit ratio is a bit tricky. We need to identify the
cases where additional planning complexity makes it measurably slower,
without getting better performance at execution. And then we need to
somehow argue whether those cases are frequent enough or not.

ISTM that the worst case would be a data set with many partial indexes,
that however don't allow IOS. And the amount of data would have to be
small, so that the queries don't take too much time (which would make
the additional planning time noise).

However that was the idea of the benchmark, and I got no difference.

regards
Tomas

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Simon Riggs

simon@2ndQuadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#7)

Re: PATCH: index-only scans with partial indexes

On 4 September 2015 at 22:03, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

A summary of 100 EXPLAIN timings looks like this:

master A B C D E F
-------------------------------------------------------------------------
min 0.10 0.10 0.30 0.29 0.66 0.23
max 1.07 1.00 2.13 1.98 4.52 1.59
median 0.49 0.52 0.31 0.33 0.68 1.12
average 0.43 0.35 0.62 0.49 1.01 0.89

What are these? Times? in ms?

However that was the idea of the benchmark, and I got no difference.

Please explain what this means and your conclusion, so its clear. That way
we can either reject the patch or commit it. Thanks

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Simon Riggs (#8)

Re: PATCH: index-only scans with partial indexes

Hi,

On 09/05/2015 10:53 AM, Simon Riggs wrote:

On 4 September 2015 at 22:03, Tomas Vondra <tomas.vondra@2ndquadrant.com
<mailto:tomas.vondra@2ndquadrant.com>> wrote:

A summary of 100 EXPLAIN timings looks like this:

master A B C D E F
-------------------------------------------------------------------------
min 0.10 0.10 0.30 0.29 0.66 0.23
max 1.07 1.00 2.13 1.98 4.52 1.59
median 0.49 0.52 0.31 0.33 0.68 1.12
average 0.43 0.35 0.62 0.49 1.01 0.89

What are these? Times? in ms?

Yes, those are planning times in milliseconds. I've been thinking about
possible issues in the benchmark, and I ended up with two main suspects:

(a) environment - VM running on a laptop. thus quite noisy and
subject to various sources of overhead, power-management, etc.

(b) time measured using \timing in psql (by running EXPLAIN), so
probably influenced by formatting/transfer

So I reran the benchmark on a different machine (bare metal, pretty much
no noise in the results), and measured the planning time using EXPLAIN
ANALYZE (Planning Time). And I got this (milliseconds):

A B C D E F
-----------------------------------------------------------------
min 0.04 0.04 0.11 0.10 0.37 0.12
max 0.10 0.10 0.92 0.92 1.62 1.23
median 0.04 0.04 0.11 0.11 0.37 0.13
average 0.04 0.04 0.11 0.11 0.38 0.14

A B C D E F
-----------------------------------------------------------------
min 0.04 0.04 0.11 0.11 0.38 0.13
max 0.10 0.10 0.92 0.94 1.64 1.21
median 0.04 0.04 0.11 0.11 0.39 0.13
average 0.04 0.04 0.11 0.12 0.40 0.14

So much lower numbers (better CPU, no virtualization, etc.), but
otherwise exactly the same conclusion - no overhead compared to master.

I think of three ways how to make the checks more expensive:

(a) using more indexes

The current benchmark already uses 40 indexes (and I've tried
with 100), and we've seen no impact at all. Adding more indexes
will eventually show some overhead, but the number of indexes
will be very high - I doubt anyone has a table with hundreds of
partial indexes on a it.

(b) using more complex index predicates

I expect the predicate_implied_by() call to get more expensive
for more complex predicates. I however believe that's quite
uncommon case - vast majority of index predicates that I've seen
use just a single equality clause.

Having more complex WHERE clauses seems quite plausible, though,
so I've decided to try it. Instead of the simple query used
before:

select a from t where b >= 100 and b <= 200;

I've used a query with a bunch of other conditions:

select a from t where b >= 100 and b <= 200
and c >= 100 and c <= 200
and d >= 100 and d <= 200
and a >= 100 and a <= 100;

And indeed, this made a (tiny) difference - on the master, the
planning was 0.50 ms on average, while with the patch it was
0.55. But 0.05 ms is just barely above noise, even on this HW.

Of course, this only impacts the case with partial indexes, all
the other cases were exactly the same with and without patch.

However that was the idea of the benchmark, and I got no difference.

Please explain what this means and your conclusion, so its clear. That
way we can either reject the patch or commit it. Thanks

That means I've been unable to measure any significant overhead of the
patch. There certainly are extreme cases where this patch might make the
planning noticeably slower, but I believe those are rather artificial,
and certainly wouldn't expect them in databases where a tiny increase of
planning time would be a problem.

This benchmark however only looked at the planning overhead, but we
should weight that with respect to possible gains. And IOS is a great
optimization - it's not uncommon to see 2-3x improvements on databases
that fit into RAM, and order of magnitude improvements on large
databases (thanks to eliminating the random I/O when accessing the heap).

So my opinion is that we should commit this patch.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 11 years ago

In reply to: Tomas Vondra (#9)

Re: PATCH: index-only scans with partial indexes

Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

That means I've been unable to measure any significant overhead
of the patch.

I've run a lot of benchmarks, and with anything resembling a common
query the differences in planning time are lost in the noise. (I
didn't create a better example than Tomas of where a lot of indexes
cause a minimal increase in planning time.)
The test environment is a "bare iron" machine with:

1 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (4 cores; 8 HW threads)
16GB DDR3 RAM
2 1TB drives in RAID 1
ubuntu 14.04 LTS 64-bit
various checkouts from master, most recently a7212a99
no cassert, default optimizations

As one example, to get a heap bigger than RAM I set up like this:

drop table if exists t;
create table t (a int not null, b int not null, x text not null);
insert into t
select i, i, repeat(md5(i::text), 50)
from generate_series(1,10000000) s(i);
vacuum freeze t;
checkpoint;

I ran one-index tests like this:

create index t_b_partial on t(b) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose)
select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(*) from t where a > 1000 and a < 2000;

... then two-index tests like this:

create index t_b_a_partial on t(b, a) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose)
select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(*) from t where a > 1000 and a < 2000;

All queries were run 5 times and (to minimize stray slowdowns from
other sources on this desktop machine) I took the minimum plan time
and minimum execution time. (My browser and other optional
processes were stopped to also minimize noise, but the results
still had more noise than I would prefer.)

master - single index
---------------------
Planning time: 0.078 ms
Execution time: 0.544 ms

Planning time: 0.079 ms
Execution time: 0.533 ms

Planning time: 0.066 ms
Execution time: 0.491 ms

master - both indexes
---------------------
Planning time: 0.080 ms
Execution time: 0.396 ms

Planning time: 0.076 ms
Execution time: 0.373 ms

Planning time: 0.056 ms
Execution time: 0.275 ms

patched - single index
----------------------
Planning time: 0.032 ms
Execution time: 0.136 ms

Planning time: 0.079 ms
Execution time: 0.537 ms

Planning time: 0.050 ms
Execution time: 0.213 ms

patched - both indexes
----------------------
Planning time: 0.100 ms
Execution time: 0.373 ms

Planning time: 0.067 ms
Execution time: 0.251 ms

Planning time: 0.065 ms
Execution time: 0.240 ms

In my view, the most disappointing thing about the patch is that
when both indexes are present, it doesn't use the narrower one. If
*only* the narrower index is present, it runs the index-only scan
using that index for count(b) and count(*), which is faster. Can
we wrangle this patch into making a better choice among available
index-only scans?

It also seems disappointing that we don't recognize that
count(columnname) could be treated as a synonym for count(*) if
columnname is NOT NULL, but that doesn't seem like material for
this patch.

Benchmarking took so much time I did not get to a close review of
the code changes. :-( Based on just the test results, it appears
to me that the patch as it stands would be a net win for the vast
majority of workloads where it would make any noticeable difference,
and I didn't manage to create any big downside. I would like to
see whether we can't improve the choice of partial index when there
are multiple possibilities -- it seems quite surprising to see
identical estimates for indexes of different column counts and key
widths, and to see the wider index chosen when the narrow one is
clearly (and predictably) faster.

I am changing this to Waiting on Author.

I will be on vacation without Internet access for the next 15 days,
so hopefully someone else can have a look when a new version is
posted. If it's still open I'll have a look when I get back.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kevin Grittner (#10)

Re: PATCH: index-only scans with partial indexes

Hi,

On 09/13/2015 08:03 PM, Kevin Grittner wrote:

In my view, the most disappointing thing about the patch is that
when both indexes are present, it doesn't use the narrower one. If
*only* the narrower index is present, it runs the index-only scan
using that index for count(b) and count(*), which is faster. Can
we wrangle this patch into making a better choice among available
index-only scans?

That's indeed strange, but after poking into that for a while, it seems
rather like a costing issue. Let me demonstrate:

create table t (a int, b int);
insert into t select i,i from generate_series(1,1000000) s(i);

create index idx1 on t (a) where b between 300000 and 600000;
create index idx2 on t (a,b) where b between 300000 and 600000;

vacuum t;
analyze t;

explain select a from t where b between 300000 and 600000;

QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx2 on t (cost=0.42..9236.88 rows=297823 width=4)
Index Cond: ((b >= 300000) AND (b <= 600000))
(2 rows)

drop index idx2;

QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx1 on t (cost=0.42..9236.88 rows=297823 width=4)
(1 row)

Now, both plans are index only scans, but the first one has Index Cond
and the other one does not!

I've put a bunch of logging into cost_index(), and turns out that while
the final cost is *exactly* the same, it's most likely by chance. After
we call amcostestimate, we get these two results:

idx1: indexStartupCost=0.422500 indexTotalCost=4769.537500
indexSelectivity=0.297823 indexCorrelation=1.000000

idx2: indexStartupCost=0.422500 indexTotalCost=6258.652500
indexSelectivity=0.297823 indexCorrelation=0.750000

So amcostestimate does make a difference, and we get

idx1: run_cost = 4769.115000
idx2: run_cost = 6258.230000

and then for both indexes

tuples_fetched=297823.000000
loop_count=1.000000
pages_fetched = 0.000000

but then we do

run_cost += cpu_per_tuple * tuples_fetched;

and we end up with

run_cost = 9236.460000

for both indexes. How's that possible? Number of tuples fetched is
exactly the same for both indexes (297823), so clearly cpu_per_tuple
must be different. That however seems a bit strange, because the only
difference between the indexes is the additional column, and the
condition should be covered by the index predicate ...

It seems that the problem is related to this:

qpquals
= extract_nonindex_conditions(baserel->baserestrictinfo,
path->indexquals);

while the "larger" index on (a,b) gets

path->indexquals=(b BETWEEN 300000 AND 600000)
qpquals=NIL

the "smaller" index on (a) gets

path->indexquals=NIL
qpquals=(b BETWEEN 300000 AND 600000)

And so the larger index gets qpqual_cost=0, the smaller one gets
qpqual_cost=0.005, and so cpu_per_tuple is either 0.01 or 0.015.

Which is exactly the difference between costs from amcostestimate

idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

Sppoky! Although it seems like a mere coincidence, thanks to the nice
round numbers of tuples in the table, and lucky choice of two conditions.

For example after replacing the BETWEEN condition (which is actually two
conditions) with a single one (b<300000) - both in the indexes and
query, I get this:

QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx2 on t (cost=0.42..8541.25 rows=299507 width=4)
Index Cond: (b < 300000)
(2 rows)

drop index idx2;

QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx1 on t (cost=0.42..8541.43 rows=299507 width=4)
(1 row)

The plans are not costed exactly the same anymore (I'm not saying the
costs are correct - clearly still the 'larger' index was preferred).

It's not bound to index only scan either - after adding another column
to the table, and requesting it from the query (so preventing IOS), I
get exactly the same issues.

I really wonder why we get different path->indexquals for those indexes,
because that's the root of the issue here. Any ideas?

It also seems disappointing that we don't recognize that
count(columnname) could be treated as a synonym for count(*) if
columnname is NOT NULL, but that doesn't seem like material for
this patch.

Yeah, that's really not what this patch deals with.

Benchmarking took so much time I did not get to a close review of
the code changes. :-( Based on just the test results, it appears
to me that the patch as it stands would be a net win for the vast
majority of workloads where it would make any noticeable difference,
and I didn't manage to create any big downside. I would like to
see whether we can't improve the choice of partial index when there
are multiple possibilities -- it seems quite surprising to see
identical estimates for indexes of different column counts and key
widths, and to see the wider index chosen when the narrow one is
clearly (and predictably) faster.

I am changing this to Waiting on Author.

I will be on vacation without Internet access for the next 15 days,
so hopefully someone else can have a look when a new version is
posted. If it's still open I'll have a look when I get back.

Thanks for the feedback!

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#11)

Re: PATCH: index-only scans with partial indexes

Hi,

At Sun, 13 Sep 2015 23:21:30 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F5E8DA.8080303@2ndquadrant.com>

That's indeed strange, but after poking into that for a while, it
seems rather like a costing issue. Let me demonstrate:

...

Now, both plans are index only scans, but the first one has Index Cond
and the other one does not!

The seemingly removed IndexCond qual is counted as non-index
quals at the last in cost_index. The quals that the partial index
implies should be ignored on cost_estimation.

I've put a bunch of logging into cost_index(), and turns out that
while the final cost is *exactly* the same, it's most likely by
chance. After we call amcostestimate, we get these two results:

So it is *not by chance* but a stable behavior defined by
algorithm.

It seems that the problem is related to this:

qpquals
= extract_nonindex_conditions(baserel->baserestrictinfo,
path->indexquals);

while the "larger" index on (a,b) gets

path->indexquals=(b BETWEEN 300000 AND 600000)
qpquals=NIL

the "smaller" index on (a) gets

path->indexquals=NIL
qpquals=(b BETWEEN 300000 AND 600000)

And so the larger index gets qpqual_cost=0, the smaller one gets
qpqual_cost=0.005, and so cpu_per_tuple is either 0.01 or 0.015.

Which is exactly the difference between costs from amcostestimate

idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

These calculations are exactly right, but you overlooked the
breakedown of indexTotalCost for idx2.

Sppoky! Although it seems like a mere coincidence, thanks to the nice
round numbers of tuples in the table, and lucky choice of two
conditions.

As said above, it is not a conincidence. The exactly same
calculation about baserestrictinfo is simply calculated in
different places, cost_index for the former and
btcostestiamte(genericcostestimate) for the latter.

We should properly ignore or remove the implicitly-applied quals
for partial indexes on cost estimation.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kyotaro Horiguchi (#12)

Re: PATCH: index-only scans with partial indexes

On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote:

Hi,

,,,

Which is exactly the difference between costs from amcostestimate

idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

These calculations are exactly right, but you overlooked the
breakedown of indexTotalCost for idx2.

Sppoky! Although it seems like a mere coincidence, thanks to the nice
round numbers of tuples in the table, and lucky choice of two
conditions.

As said above, it is not a conincidence. The exactly same
calculation about baserestrictinfo is simply calculated in
different places, cost_index for the former and
btcostestiamte(genericcostestimate) for the latter.

By "coincidence" I meant that we happened to choose such a number of
conditions in the index predicate & query that this perfect match is
possible. Apparently there are two places that manipulate the costs and
in this particular case happen to perfectly compensate the effects.

As demonstrated by the example with a single condition, the costs may
actually differ for different numbers of clauses (e.g. using a single
clause makes the wider index - unexpectedly - cheaper).

We should properly ignore or remove the implicitly-applied quals
for partial indexes on cost estimation.

Probably. So far I've traced the difference to build_index_paths() where
we build index_clauses by iterating over index columns - the smaller
index does not have the column from the predicate, so we don't add the
clause. I'm not particularly familiar with this part of the code, so I
wonder where's the best place to fix this, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#13)

Re: PATCH: index-only scans with partial indexes

Hi, this looks to be a bug of cost_index(). The attached patch
would fix that.

=====
The following part in cost_index,

cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;

run_cost += cpu_per_tuple * tuples_fetched;

Adds, *cpu_tuple_cost* (which is 0.01) + qpqual_cost.per_tuple
(0.0025) per tuple even they are index tuples. On the other hand
getnericcostestimate adds the following value for the same deed.

indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost);

cpu_index_tuple_cost is 0.005, just a half of cpu_tuple cost as
default. I think this should be the culprit of the difference.

For confirmation, setting cpu_tuple_cost to 0.05 to equate with
cpu_index_tuple_cost and the oppisit makes the estimate for both
indexes the same value.

set cpu_tuple_cost to 0.005;
explain select a from t where b < 300000;
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx2 on t (cost=0.42..7022.06 rows=297876 width=4)
Index Cond: (b < 300000)
(2 rows)

explain select a from t where b < 300000;
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx1 on t (cost=0.42..7022.66 rows=297876 width=4)
(1 row)

This should be a bug. The attached patch would fix this and
perhaps costs for all of your examples should match except for
errors of double precision. I think it is enough since
IndexOnlyScan may not have quals on columns out of the index in
focus so qpquals should be index quals.

regards,

At Mon, 14 Sep 2015 10:00:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F67E98.5050904@2ndquadrant.com>

On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote:

Hi,

,,,

Which is exactly the difference between costs from amcostestimate

idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

These calculations are exactly right, but you overlooked the
breakedown of indexTotalCost for idx2.

Sppoky! Although it seems like a mere coincidence, thanks to the nice
round numbers of tuples in the table, and lucky choice of two
conditions.

As said above, it is not a conincidence. The exactly same
calculation about baserestrictinfo is simply calculated in
different places, cost_index for the former and
btcostestiamte(genericcostestimate) for the latter.

By "coincidence" I meant that we happened to choose such a number of
conditions in the index predicate & query that this perfect match is
possible. Apparently there are two places that manipulate the costs
and in this particular case happen to perfectly compensate the
effects.

Ok, I understood.

As demonstrated by the example with a single condition, the costs may
actually differ for different numbers of clauses (e.g. using a single
clause makes the wider index - unexpectedly - cheaper).

We should properly ignore or remove the implicitly-applied quals
for partial indexes on cost estimation.

Probably. So far I've traced the difference to build_index_paths()
where we build index_clauses by iterating over index columns - the
smaller index does not have the column from the predicate, so we don't
add the clause. I'm not particularly familiar with this part of the
code, so I wonder where's the best place to fix this, though.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#15

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 11 years ago

In reply to: Kyotaro Horiguchi (#14)

Re: PATCH: index-only scans with partial indexes

Sorry.

Hi, this looks to be a bug of cost_index(). The attached patch
would fix that.

No, that's wrong. please forget the patch. The qual in qpquals
should be indexquals which is excluded because it is not
necessary to be applied. The right way would be remove the cost
for qpqual in cost_index().

=====
The following part in cost_index,

cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;

run_cost += cpu_per_tuple * tuples_fetched;

Adds, *cpu_tuple_cost* (which is 0.01) + qpqual_cost.per_tuple
(0.0025) per tuple even they are index tuples. On the other hand
getnericcostestimate adds the following value for the same deed.

indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost);

cpu_index_tuple_cost is 0.005, just a half of cpu_tuple cost as
default. I think this should be the culprit of the difference.

For confirmation, setting cpu_tuple_cost to 0.05 to equate with
cpu_index_tuple_cost and the oppisit makes the estimate for both
indexes the same value.

set cpu_tuple_cost to 0.005;
explain select a from t where b < 300000;
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx2 on t (cost=0.42..7022.06 rows=297876 width=4)
Index Cond: (b < 300000)
(2 rows)

explain select a from t where b < 300000;
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx1 on t (cost=0.42..7022.66 rows=297876 width=4)
(1 row)

This should be a bug. The attached patch would fix this and
perhaps costs for all of your examples should match except for
errors of double precision. I think it is enough since
IndexOnlyScan may not have quals on columns out of the index in
focus so qpquals should be index quals.

regards,

At Mon, 14 Sep 2015 10:00:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F67E98.5050904@2ndquadrant.com>

On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote:

Hi,

,,,

Which is exactly the difference between costs from amcostestimate

idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

These calculations are exactly right, but you overlooked the
breakedown of indexTotalCost for idx2.

Sppoky! Although it seems like a mere coincidence, thanks to the nice
round numbers of tuples in the table, and lucky choice of two
conditions.

As said above, it is not a conincidence. The exactly same
calculation about baserestrictinfo is simply calculated in
different places, cost_index for the former and
btcostestiamte(genericcostestimate) for the latter.

By "coincidence" I meant that we happened to choose such a number of
conditions in the index predicate & query that this perfect match is
possible. Apparently there are two places that manipulate the costs
and in this particular case happen to perfectly compensate the
effects.

Ok, I understood.

As demonstrated by the example with a single condition, the costs may
actually differ for different numbers of clauses (e.g. using a single
clause makes the wider index - unexpectedly - cheaper).

We should properly ignore or remove the implicitly-applied quals
for partial indexes on cost estimation.

Probably. So far I've traced the difference to build_index_paths()
where we build index_clauses by iterating over index columns - the
smaller index does not have the column from the predicate, so we don't
add the clause. I'm not particularly familiar with this part of the
code, so I wonder where's the best place to fix this, though.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 11 years ago

In reply to: Kyotaro Horiguchi (#15)

Re: PATCH: index-only scans with partial indexes

I rethinked on this from the first.

Sorry.

Hi, this looks to be a bug of cost_index(). The attached patch
would fix that.

No, that's wrong. please forget the patch. The qual in qpquals
should be indexquals which is excluded because it is not
necessary to be applied. The right way would be remove the cost
for qpqual in cost_index().

Your patch allows index only scan even if a qual contains
non-index column when the qual can be removed by implication from
index predicates.

So the 'implied' clauses is not needed ever after. It should be
excluded from cost estimation and it is not needed on execution
even if index only scan is found not to be doable finally.

So the implicit quals may be removed on building index paths but
I think check_index_only is not the place.

Removing implied quals from index quals is not only for index
*only* scan so the place for removing such quals is in
build_index_paths, in the loop of step 1. After removing the
quals there, check_index_only will naturally give disired result.

# I remember that I have tried the same or similar thing. I don't
# recall what made me give up then.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kyotaro Horiguchi (#16)

Re: PATCH: index-only scans with partial indexes

On 09/14/2015 12:51 PM, Kyotaro HORIGUCHI wrote:

I rethinked on this from the first.

Sorry.

Hi, this looks to be a bug of cost_index(). The attached patch
would fix that.

No, that's wrong. please forget the patch. The qual in qpquals
should be indexquals which is excluded because it is not
necessary to be applied. The right way would be remove the cost
for qpqual in cost_index().

Your patch allows index only scan even if a qual contains
non-index column when the qual can be removed by implication from
index predicates.

So the 'implied' clauses is not needed ever after. It should be
excluded from cost estimation and it is not needed on execution
even if index only scan is found not to be doable finally.

So the implicit quals may be removed on building index paths but
I think check_index_only is not the place.

Removing implied quals from index quals is not only for index
*only* scan so the place for removing such quals is in
build_index_paths, in the loop of step 1. After removing the
quals there, check_index_only will naturally give disired result.

# I remember that I have tried the same or similar thing. I don't
# recall what made me give up then.

I don't think this is particularly related to the patch, because some of
the anomalies can be observed even on master. For example, let's force
the index scans to be non-IOS by requesting another column from the heap:

create table t (a int, b int, c int);
insert into t select i,i,i from generate_series(1,1000000) s(i);

create index idx1 on t (a) where b between 300000 and 600000;
create index idx2 on t (a,b) where b between 300000 and 600000;

analyze t;
vacuum t;

The indexes have exactly the same size (thanks to alignment of
IndexTuples), and should have exactly the same statistics:

Now, let's see the query reading column 'c' (forcing heap fetches)

explain select c from t where b between 300000 and 600000;
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx1 on t (cost=0.42..10945.99 rows=300971 width=4)
(1 row)

drop index idx1;
set enable_bitmapscan = off;

explain select c from t where b between 300000 and 600000;
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx2 on t (cost=0.42..19688.08 rows=300971 width=4)
Index Cond: ((b >= 300000) AND (b <= 600000))
(2 rows)

I claim that either both or none of the indexes should use "Index Cond".

This is exactly the same reason that lead to the strange behavior after
applying the patch, but in this case the heap access actually introduces
some additional cost so the issue is not that obvious.

But in reality the costs should be pretty much exactly the same - the
indexes have exactly the same size, statistics, selectivity etc.

Also, the plan difference suggests this is not merely a costing issue,
because while with idx1 (first plan) it was correctly detected we don't
need to evaluate the condition on the partial index, on idx2 that's not
true and we'll waste time doing that. So we probably can't just tweak
the costing a bit - this probably needs to be addressed when actually
building the index path.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#17)

Re: PATCH: index-only scans with partial indexes

Hi, sorry in advance for hardly readable long descriptions..

At Mon, 14 Sep 2015 13:27:47 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F6AF33.8000206@2ndquadrant.com>

I don't think this is particularly related to the patch, because some
of the anomalies can be observed even on master. For example, let's
force the index scans to be non-IOS by requesting another column from
the heap:

...

I claim that either both or none of the indexes should use "Index
Cond".

Perhaps I understood your point and I think I understood this.

Planner dicides whether to use the partial index far before
creating specific paths, when measuring baserel sizes. If some
partial index is implied by the restriction list,
check_partial_indexes() marks the index as predOk, which means
that the index is usable for the baserel scan even if the
restriction clauses doesn't match the index columns.

Then create_index_paths allows to generating paths for partial
indexes with predOK. Index clauses of the created paths for the
partial indexes don't contain clauses doesn't match the index
columns. It is empty for your first example and it is the same
with restrictinfo for the second.

Finally, create_indexscan_plan strips baserestriction caluses
implied by index predicate in addtion to index conditions. So
both of your example has no filter conditions.

The following example would show you the result of the above
steps.

explain select c from t where b between 300000 + 1 and 600000 and c = 3;
QUERY PLAN
------------------------------------------------------------------
Index Scan using idx1 on t (cost=0.42..11665.77 rows=1 width=4)
Filter: ((b >= 300001) AND (c = 3))

The index predicate (b >= 300000 AND b <= 600000) implies the
restrction (b >= 300001 AND b <= 600000) so idx1 is allowed to be
used, then conversely the restriction b >= 300001 is implied by
the index predicate b >= 300000 so it is not shown as Index Cond
and the other two are not, so they are shown and executed as
Filter.

But regardless of whether stripped as implied conditions or not
at planning phase, the cost for all clauses that don't match the
index columns are added when creating index paths. That will be
one of the cause of cost error.

This is exactly the same reason that lead to the strange behavior
after applying the patch, but in this case the heap access actually
introduces some additional cost so the issue is not that obvious.

So you're right on the point that check_index_only is doing
wrong. It should properly ignore restrictions implied by index
predicates as your patch is doing. But cost_index doesn't know
that some nonindex-conditions of baserestrictinfo is finally
useless, and it is assuming that nonindex conditions are always
applied on heap tuples.

After all, what should be done to properly ignore useless
conditions would be,

1. Make create_index_paths() to make the list of restrict
clauses which are implied by the index predicate of the index
in focus. The list would be stored as a member in
IndexOptInfo. Then create index clauses excluding the listed
clauses and call get_index_paths using them. Modify
check_index_only() to ignore the listed clauses when pulling
varattnos. This is similar but different a bit to what I said
in the previous mail.

2. Pass the listed clauses to generated IndexPath.

3. Modify extract_nonindex_conditions to be called with the
exclusion list and the cluases are exluded from the result of
the function.

4. Make create_indexscan_plan to extract qpqual excluding the
exclusion list.

The same result could be obtained by more smarter way but the
steps above will archive right decision on whether to do index
only scan on partial index and correct cost estimate propery
ignoring unused restrictions.

Does this make sense for you?

But in reality the costs should be pretty much exactly the same - the
indexes have exactly the same size, statistics, selectivity etc.

Also, the plan difference suggests this is not merely a costing issue,
because while with idx1 (first plan) it was correctly detected we
don't need to evaluate the condition on the partial index, on idx2
that's not true and we'll waste time doing that. So we probably can't
just tweak the costing a bit - this probably needs to be addressed
when actually building the index path.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kyotaro Horiguchi (#18)

Re: PATCH: index-only scans with partial indexes

Hello Horiguchi-san,

On 09/17/2015 12:45 PM, Kyotaro HORIGUCHI wrote:

After all, what should be done to properly ignore useless
conditions would be,

1. Make create_index_paths() to make the list of restrict
clauses which are implied by the index predicate of the index
in focus. The list would be stored as a member in
IndexOptInfo. Then create index clauses excluding the listed
clauses and call get_index_paths using them. Modify
check_index_only() to ignore the listed clauses when pulling
varattnos. This is similar but different a bit to what I said
in the previous mail.

2. Pass the listed clauses to generated IndexPath.

3. Modify extract_nonindex_conditions to be called with the
exclusion list and the cluases are exluded from the result of
the function.

4. Make create_indexscan_plan to extract qpqual excluding the
exclusion list.

The same result could be obtained by more smarter way but the
steps above will archive right decision on whether to do index
only scan on partial index and correct cost estimate propery
ignoring unused restrictions.

Does this make sense for you?

Yes, this seems sane. I've been poking at this a bit too, and I came to
the same plan in general, except that I think it's better to build list
of clauses that are *not* implied by the index, because that's what we
need both in cost_index and check_index_only.

It also seems to me that this change (arguably a bug fix) should pretty
much make the original patch irrelevant, because check_index_only can
simply walk over the new list.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#19)

Re: PATCH: index-only scans with partial indexes

Hello,

At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55FADEEB.4000907@2ndquadrant.com>

Yes, this seems sane. I've been poking at this a bit too, and I came
to the same plan in general, except that I think it's better to build
list of clauses that are *not* implied by the index, because that's
what we need both in cost_index and check_index_only.

I intended to isolate IndexOptInfo from belonging RelOptInfo but
the exclusion list also bonds them tightly, and one IndexOptInfo
belongs to only one RelOptInfo so no need to isolate. So
not-implied-restrictclauses in IndexOptInfo would be preferable.

It also seems to me that this change (arguably a bug fix) should
pretty much make the original patch irrelevant, because
check_index_only can simply walk over the new list.

Yeah. This seems to be a bug irrelevant to your index-only-scan
ptch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#20)

#22

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#21)

#23

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#22)

#24

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#23)

#25

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#24)

#26

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#24)

#27

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#25)

#28

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#27)

#29

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#28)

#30

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#29)

#31

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#30)

#32

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#31)

#33

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#32)

#34

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#33)

#35

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#34)

#36

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#35)

#37

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Tomas Vondra (#36)

#38

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Michael Paquier (#37)

#39

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#38)

#40

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Alvaro Herrera (#39)

#41

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Konstantin Knizhnik (#40)

#42

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Alvaro Herrera (#41)

#43

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#36)

#44

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#43)

#45

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#44)

#46

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#45)

#47

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#46)

#48

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#47)

#49

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Kyotaro Horiguchi (#48)

#50

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Michael Paquier (#49)

#51

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Robert Haas (#50)

#52

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Michael Paquier (#51)

#53

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Kyotaro Horiguchi (#47)

#54

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Tom Lane (#53)

#55

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#54)

#56

David Steele

david@pgmasters.net

over 10 years ago

In reply to: Kyotaro Horiguchi (#54)

#57

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: David Steele (#56)

#58

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Kyotaro Horiguchi (#54)

#59

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 10 years ago

In reply to: Tomas Vondra (#58)

#60

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 10 years ago

In reply to: Kevin Grittner (#59)

#61

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 10 years ago

In reply to: Kyotaro Horiguchi (#60)

#62

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 10 years ago

In reply to: Tomas Vondra (#61)

#63

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Kevin Grittner (#59)

#64

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 10 years ago

In reply to: Tom Lane (#63)

#65

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 10 years ago

In reply to: Tomas Vondra (#61)

#66

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 10 years ago

In reply to: Tomas Vondra (#64)

#67

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Kyotaro Horiguchi (#66)

#68

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 10 years ago

In reply to: Tom Lane (#67)

PATCH: index-only scans with partial indexes

Attachments:

Attachments:

Attachments: