Bitmap index scans use of filters on available columns

Started by Jeff Janesabout 10 years ago25 messages

jeff.janes@gmail.com

about 10 years ago

create table f as select (random()*100)::int as x, md5(random()::text)
as y from generate_series(1,1000000);

create index on f (x, y);
analyze verbose f; --dont vacuum

explain select * from f where x=5 and y like '%abc%';

QUERY PLAN
------------------------------------------------------------------------------
Bitmap Heap Scan on f (cost=382.67..9314.72 rows=1 width=37)
Recheck Cond: (x = 5)
Filter: (y ~~ '%abc%'::text)
-> Bitmap Index Scan on f_x_y_idx (cost=0.00..382.67 rows=10433 width=0)
Index Cond: (x = 5)

Is there a fundamental reason the filter on y is not being applied to
the index scan, rather than the heap scan? Should this be a to-do
item? This could avoid a lot of heap block reads, and also might
prevent the bitmap from overflowing work_mem and turning lossy.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Antonin Houska

ah@cybertec.at

about 10 years ago

In reply to: Jeff Janes (#1)

Re: Bitmap index scans use of filters on available columns

While prefix expression

y like 'abc%'

can be converted to

y >= 'abc'

(see expand_indexqual_opclause()), I'm not sure any kind of expansion is
possible for '%abc%' which would result in a b-tree searchable condition.

Jeff Janes <jeff.janes@gmail.com> wrote:

Is there a fundamental reason the filter on y is not being applied to
the index scan, rather than the heap scan?

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Nicolas Barbier

nicolas.barbier@gmail.com

about 10 years ago

In reply to: Antonin Houska (#2)

Re: Bitmap index scans use of filters on available columns

2015-11-04 Antonin Houska <ah@cybertec.at>:

While prefix expression

y like 'abc%'

can be converted to

y >= 'abc'

(see expand_indexqual_opclause()), I'm not sure any kind of expansion is
possible for '%abc%' which would result in a b-tree searchable condition.

I think the question is not about using the b-tree for checking the
condition, but about just retrieving the value for y from the index,
and just using that to check the condition before fetching the
corresponding tuple from the heap.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Nicolas Barbier (#3)

Re: Bitmap index scans use of filters on available columns

Nicolas Barbier <nicolas.barbier@gmail.com> writes:

2015-11-04 Antonin Houska <ah@cybertec.at>:

(see expand_indexqual_opclause()), I'm not sure any kind of expansion is
possible for '%abc%' which would result in a b-tree searchable condition.

I think the question is not about using the b-tree for checking the
condition, but about just retrieving the value for y from the index,
and just using that to check the condition before fetching the
corresponding tuple from the heap.

Bitmap index scans only return TID bitmaps, not index tuples; indeed
the underlying index may not store anything recognizable as tuples.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

about 10 years ago

In reply to: Tom Lane (#4)

Re: Bitmap index scans use of filters on available columns

On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nicolas Barbier <nicolas.barbier@gmail.com> writes:

2015-11-04 Antonin Houska <ah@cybertec.at>:

(see expand_indexqual_opclause()), I'm not sure any kind of expansion is
possible for '%abc%' which would result in a b-tree searchable

condition.

I think the question is not about using the b-tree for checking the
condition, but about just retrieving the value for y from the index,
and just using that to check the condition before fetching the
corresponding tuple from the heap.

Bitmap index scans only return TID bitmaps, not index tuples; indeed
the underlying index may not store anything recognizable as tuples.

Agreed, though the OP's question was asking why a Filter condition can't be
added to a BitmapIndexScan, so that non-qualifying rows can be excluded
from the TID bitmap it generates.

We generate this plan

postgres=# explain select * from f where x=5 and y like '%abc%';

QUERY PLAN

--------------------------------------------------------------------------

Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37)
Index Cond: (x = 5)
Filter: (y ~~ '%abc%'::text)

So it should be possible to do the Filter condition on the BitmapIndexScan.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Simon Riggs (#5)

Re: Bitmap index scans use of filters on available columns

Simon Riggs <simon@2ndquadrant.com> writes:

On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We generate this plan
Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37)
Index Cond: (x = 5)
Filter: (y ~~ '%abc%'::text)

So it should be possible to do the Filter condition on the BitmapIndexScan.

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

In the case at hand, the planner should have considered a plan of this
shape as well. Presumably it concluded it was more expensive than using
the bitmap approach. Jeff might try "set enable_bitmapscan = 0" and
compare the estimated and actual costs.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

about 10 years ago

In reply to: Tom Lane (#6)

Re: Bitmap index scans use of filters on available columns

On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We generate this plan
Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37)
Index Cond: (x = 5)
Filter: (y ~~ '%abc%'::text)

So it should be possible to do the Filter condition on the

BitmapIndexScan.

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

Still misunderstanding each other... sorry about that

If a btree can Filter y like that on an IndexScan, then it can also apply
that Filter on y when it is looking through rows before it adds them to the
bitmap.

I completely understand that it cannot return the value (of y) via the
bitmap.

We should be able to get a plan like this

explain select * from f where x=5 and y like '%abc%';

QUERY PLAN
------------------------------------------------------------
------------------
Bitmap Heap Scan on f (cost=382.67..9314.72 rows=1 width=37)
-> Bitmap Index Scan on f_x_y_idx (cost=0.00..382.67 rows=10433
width=0)
Index Cond: (x = 5)

Filter: (y ~~ '%abc%'::text)

if the bitmap stays non-lossy.

I see that plannodes.h says
"In a BitmapIndexScan plan node, the targetlist and qual fields are not
used and are always NIL. "
but it doesn't say why.

Why can't the BitmapIndexScan execute arbitrary expressions? (or What did
you mean by that phrase?)

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Simon Riggs (#7)

Re: Bitmap index scans use of filters on available columns

Simon Riggs <simon@2ndquadrant.com> writes:

On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

Still misunderstanding each other... sorry about that

If a btree can Filter y like that on an IndexScan, then it can also apply
that Filter on y when it is looking through rows before it adds them to the
bitmap.

btree applies no such filter in either case. "Filter" conditions are
applied outside the index AM --- and yes, I will resist any proposal to
relocate that responsibility.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

about 10 years ago

In reply to: Tom Lane (#8)

Re: Bitmap index scans use of filters on available columns

On 4 November 2015 at 16:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You're missing my point: that is possible in an indexscan, but *not* in

a

bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

Still misunderstanding each other... sorry about that

If a btree can Filter y like that on an IndexScan, then it can also apply
that Filter on y when it is looking through rows before it adds them to

the

bitmap.

btree applies no such filter in either case. "Filter" conditions are
applied outside the index AM --- and yes, I will resist any proposal to
relocate that responsibility.

I don't think anyone has argued that point, yet, we just don't have enough
info to agree yet.

As Jeff said in his OP, "Is there a fundamental reason the filter on y is
not being applied to
the index scan, rather than the heap scan?", which still stands.

Why would you resist? And/Or Why is that important?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Tom Lane (#8)

Re: Bitmap index scans use of filters on available columns

On Wed, Nov 4, 2015 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

Still misunderstanding each other... sorry about that

If a btree can Filter y like that on an IndexScan, then it can also apply
that Filter on y when it is looking through rows before it adds them to the
bitmap.

btree applies no such filter in either case. "Filter" conditions are
applied outside the index AM --- and yes, I will resist any proposal to
relocate that responsibility.

I don't understand what you're arguing against here - as Simon I think
correctly points out, there seems to be a substantial performance
improvement possible here. It's not so much that we would apply the
Filter conditions outside the index AM as that we would have two kinds
of Index Quals that could be enforced by the index AM. Some quals
(such as x = 5) could be enforced by scanning only the relevant
portion of the index, while others (such as y like '%abc%') don't
abridge the portion of the index that needs to be scanned, but could
disqualify some index tuples before we go to the trouble of hitting
the heap.

The problem I see is this might involve testing quals against an index
tuple when the corresponding heap tuple isn't visible. This probably
isn't safe except for leakproof quals, and there might be some other
problems as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Jeff Janes

jeff.janes@gmail.com

about 10 years ago

In reply to: Tom Lane (#6)

Re: Bitmap index scans use of filters on available columns

On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We generate this plan
Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37)
Index Cond: (x = 5)
Filter: (y ~~ '%abc%'::text)

So it should be possible to do the Filter condition on the BitmapIndexScan.

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

I had thought it must already be able to execute arbitrary
expressions, due to the ability to already support user-defined btree
ops (and ops of non-btree types in the case of other index types).
Does that only require a restricted execution environment?

In the case at hand, the planner should have considered a plan of this
shape as well. Presumably it concluded it was more expensive than using
the bitmap approach. Jeff might try "set enable_bitmapscan = 0" and
compare the estimated and actual costs.

It is a simplified test case, I can get it to do about anything I want
by tweaking the size, selectivity, columns selected, and cache warmth.

My understanding is that the normal (not index-only) index scan, which
is thought to be slower and actually is slower with a hot cache, also
applies the filter on the heap tuple, not the index tuple. But since
it only matters much for large queries and large queries tend to
prefer bitmaps, I was less interested in that case. And bitmaps can
feed into BitmapAnd, which is very helpful in avoiding index-creep
where every query gets a new index tailored for it. I think we are
missing a trick here that could apply to lots of situations.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Jeff Janes (#11)

Re: Bitmap index scans use of filters on available columns

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

I had thought it must already be able to execute arbitrary
expressions, due to the ability to already support user-defined btree
ops (and ops of non-btree types in the case of other index types).

No. An index AM is only expected to be able to evaluate clauses of
the form <indexed_column> <indexable_operator> <constant>, and the key
restriction there is that the operator is one that the AM has volunteered
to support. Well, actually, it's the opclass more than the AM that
determines this, but anyway it's not just some random operator; more than
likely, the AM and/or opclass has got special logic about the operator.

This also ties into Robert's point about evaluation of operators against
index entries for dead or invisible rows. Indexable operators are much
less likely than others to have unexpected side-effects.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 10 years ago

In reply to: Tom Lane (#12)

Re: Bitmap index scans use of filters on available columns

Hi,

On 11/04/2015 11:32 PM, Tom Lane wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You're missing my point: that is possible in an indexscan, but
*not* in a bitmap indexscan, because the index AM APIs are
totally different in the two cases. In a bitmap scan, nothing
more than a TID bitmap is ever returned out to anyplace that
could execute arbitrary expressions.

I had thought it must already be able to execute arbitrary
expressions, due to the ability to already support user-defined
btree ops (and ops of non-btree types in the case of other index
types).

No. An index AM is only expected to be able to evaluate clauses of
the form <indexed_column> <indexable_operator> <constant>, and the
key restriction there is that the operator is one that the AM has
volunteered to support. Well, actually, it's the opclass more than
the AM that determines this, but anyway it's not just some random
operator; more than likely, the AM and/or opclass has got special
logic about the operator.

Isn't that pretty much exactly the point made by Jeff and Simon, that
index AM is currently only allowed to handle the indexable operators,
i.e. operators that it can explicitly optimize (e.g. use to walk the
btree and such), and completely ignores the other operators despite
having all the columns in the index. Which means we'll have to do the
heap fetch, which usually means a significant performance hit.

This also ties into Robert's point about evaluation of operators
against index entries for dead or invisible rows. Indexable operators
are much less likely than others to have unexpected side-effects.

I certainly understand there are cases that require care - like the
leakproof thing pointed out by Robert for example. I don't immediately
see why evaluation against dead rows would be a problem.

But maybe we can derive a set of rules required from the operators? Say
only those marked as leakproof when RLS is enabled on the table, and
perhaps additional things.

A "bruteforce" way would be to extend each index AM with every possible
operator, but that's not quite manageable I guess. But why couldn't we
provide a generic infrastructure that would allow filtering "safe"
expressions and validating them on an index tuple?

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Tomas Vondra (#13)

Re: Bitmap index scans use of filters on available columns

On Wed, Nov 4, 2015 at 9:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I certainly understand there are cases that require care - like the
leakproof thing pointed out by Robert for example. I don't immediately see
why evaluation against dead rows would be a problem.

Well, one thing is that you might leak information about
already-deleted rows, which could be a security vulnerability, or more
mundanely cause a function to error out when there are no actually
visible rows that could trigger such an error. It would be surprising
if you could add a CHECK(b != 0) constraint to a table, query it for
rows where a/b>1, and get a division by zero error.

But I think it's even worse: I believe there can be dead rows in the
heap whose tuple descriptor doesn't even match the current
pg_attribute contents. Consider BEGIN; ALTER TABLE ... ADD COLUMN ...
int8; INSERT ...; ROLLBACK; ALTER TABLE .. ADD COLUMN .. text; SELECT
... If the SELECT tries to decode one of the tuples added by the
failed transaction considering the new column as text when the dead
tuples in question had it as an int8, I suspect that you can crash the
server. Nothing good will happen, anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 10 years ago

In reply to: Robert Haas (#14)

Re: Bitmap index scans use of filters on available columns

Hi,

On 11/05/2015 06:51 PM, Robert Haas wrote:

On Wed, Nov 4, 2015 at 9:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I certainly understand there are cases that require care - like the
leakproof thing pointed out by Robert for example. I don't immediately see
why evaluation against dead rows would be a problem.

Well, one thing is that you might leak information about
already-deleted rows, which could be a security vulnerability, or more
mundanely cause a function to error out when there are no actually
visible rows that could trigger such an error. It would be surprising
if you could add a CHECK(b != 0) constraint to a table, query it for
rows where a/b>1, and get a division by zero error.

Yes, I guess we don't have this problem when evaluating the expression
on heap because we get to check visibility first, and after moving the
expression to the index we can't do that.

But then again, can we come up with a way to distinguish operators that
are safe to evaluate on indexes - either automatic or manual? We already
do that with the indexable operators with explicitly listing them in the
opclass, and we also explicitly use the LEAKPROOF for a similar thing. I
don't think extending the opclass is a really good solution, but why not
to invent another *PROOF flag?

But I think it's even worse: I believe there can be dead rows in the
heap whose tuple descriptor doesn't even match the current
pg_attribute contents. Consider BEGIN; ALTER TABLE ... ADD COLUMN ...
int8; INSERT ...; ROLLBACK; ALTER TABLE .. ADD COLUMN .. text; SELECT
... If the SELECT tries to decode one of the tuples added by the
failed transaction considering the new column as text when the dead
tuples in question had it as an int8, I suspect that you can crash the
server. Nothing good will happen, anyway.

But that is about heap, and we're discussing indexes here, right? I
don't think you can break the index descriptor like this, because
otherwise we'd already have problems with that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Tomas Vondra (#15)

Re: Bitmap index scans use of filters on available columns

On Thu, Nov 5, 2015 at 1:29 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Well, one thing is that you might leak information about
already-deleted rows, which could be a security vulnerability, or more
mundanely cause a function to error out when there are no actually
visible rows that could trigger such an error. It would be surprising
if you could add a CHECK(b != 0) constraint to a table, query it for
rows where a/b>1, and get a division by zero error.

Yes, I guess we don't have this problem when evaluating the expression on
heap because we get to check visibility first, and after moving the
expression to the index we can't do that.

Right.

But then again, can we come up with a way to distinguish operators that are
safe to evaluate on indexes - either automatic or manual? We already do that
with the indexable operators with explicitly listing them in the opclass,
and we also explicitly use the LEAKPROOF for a similar thing. I don't think
extending the opclass is a really good solution, but why not to invent
another *PROOF flag?

I think LEAKPROOF is probably fine for this. How would the new thing
be different?

But that is about heap, and we're discussing indexes here, right? I don't
think you can break the index descriptor like this, because otherwise we'd
already have problems with that.

Oh, right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Tomas Vondra (#13)

Re: Bitmap index scans use of filters on available columns

On Thu, Nov 5, 2015 at 7:45 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

Hi,

On 11/04/2015 11:32 PM, Tom Lane wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You're missing my point: that is possible in an indexscan, but
*not* in a bitmap indexscan, because the index AM APIs are
totally different in the two cases. In a bitmap scan, nothing
more than a TID bitmap is ever returned out to anyplace that
could execute arbitrary expressions.

I had thought it must already be able to execute arbitrary

expressions, due to the ability to already support user-defined
btree ops (and ops of non-btree types in the case of other index
types).

No. An index AM is only expected to be able to evaluate clauses of
the form <indexed_column> <indexable_operator> <constant>, and the
key restriction there is that the operator is one that the AM has
volunteered to support. Well, actually, it's the opclass more than
the AM that determines this, but anyway it's not just some random
operator; more than likely, the AM and/or opclass has got special
logic about the operator.

Isn't that pretty much exactly the point made by Jeff and Simon, that
index AM is currently only allowed to handle the indexable operators, i.e.
operators that it can explicitly optimize (e.g. use to walk the btree and
such), and completely ignores the other operators despite having all the
columns in the index. Which means we'll have to do the heap fetch, which
usually means a significant performance hit.

This also ties into Robert's point about evaluation of operators
against index entries for dead or invisible rows. Indexable operators
are much less likely than others to have unexpected side-effects.

I certainly understand there are cases that require care - like the
leakproof thing pointed out by Robert for example. I don't immediately see
why evaluation against dead rows would be a problem.

Apart from other problems discussed, I think it could also lead to
a performance penality for the cases when the qual condition is
costly as evaluating such a qual against lot of dead rows would be a
time consuming operation. I am not sure, but I think some of the
other well know databases might not have any such problems as
they store visibility info inside index due to which they don't need to
fetch the heap tuple for evaluating such quals.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#18

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

about 10 years ago

In reply to: Amit Kapila (#17)

Re: Bitmap index scans use of filters on available columns

Hello,

At Fri, 6 Nov 2015 09:49:30 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1L_5T-UYsQeMOrX54g3QeXGhhAk5YFmzZqu5MidzxzkRg@mail.gmail.com>

Apart from other problems discussed, I think it could also lead to
a performance penality for the cases when the qual condition is
costly as evaluating such a qual against lot of dead rows would be a
time consuming operation. I am not sure, but I think some of the
other well know databases might not have any such problems as
they store visibility info inside index due to which they don't need to
fetch the heap tuple for evaluating such quals.

I was junst thinking of the same thing. Can we estimate the
degree of the expected penalty using heap statistics? Of couse
not so accurate though.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 10 years ago

In reply to: Robert Haas (#16)

Re: Bitmap index scans use of filters on available columns

Hi,

On 11/05/2015 07:36 PM, Robert Haas wrote:

On Thu, Nov 5, 2015 at 1:29 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

But then again, can we come up with a way to distinguish operators that are
safe to evaluate on indexes - either automatic or manual? We already do that
with the indexable operators with explicitly listing them in the opclass,
and we also explicitly use the LEAKPROOF for a similar thing. I don't think
extending the opclass is a really good solution, but why not to invent
another *PROOF flag?

I think LEAKPROOF is probably fine for this. How would the new thing
be different?

I don't think so - AFAIK "leakproof" is about not revealing information
about arguments, nothing more and nothing less. It does not say whether
it's safe to evaluate on indexes, and I don't think we should extend the
meaning in this direction.

I find it perfectly plausible that there will be leakproof functions
that can't be pushed to indexes, and that would not be possible with a
single marker. Of course, "leakproof" may be a minimum requirement, but
another marker is needed to actually enable pushing the function to index.

Of course, we may eventually realize that leakproof really is
sufficient, and that we can push all leakproof functions to indexes. In
that case we may deprecate the other marker and just use leakproof. But
if we go just with leakproof and later find that we really need two
markers because not all leakproof functions are not index-pushable,
it'll be much harder to fix because it will cause performance
regressions for the users (some of the expressions won't be pushed to
indexes anymore).

But I think the marker is the last thing we need to worry about.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Tomas Vondra (#19)

Re: Bitmap index scans use of filters on available columns

On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I think LEAKPROOF is probably fine for this. How would the new thing
be different?

I don't think so - AFAIK "leakproof" is about not revealing information
about arguments, nothing more and nothing less. It does not say whether it's
safe to evaluate on indexes, and I don't think we should extend the meaning
in this direction.

That seems like a non-answer answer.

Clearly, if a function can leak information about it's arguments, for
example by throwing an error, we can't call it on tuples that might
not even be visible, or the behavior of the query might be change. So
any function that is not leakproof is also not safe for this purpose.

Now that doesn't rule out the possibility that the functions for which
this optimization is safe are a subset of the leakproof functions -
but offhand I don't see why that should be the case. The index tuple
is just a tuple, and the values it contains are just datums, no more
or less than in a heap tuple. There could be a reason for concern
here, but saying that it might not be "safe to evaluate on indexes"
just begs the question: WHY wouldn't that be safe?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 10 years ago

In reply to: Robert Haas (#20)

Re: Bitmap index scans use of filters on available columns

Hi,

On 11/07/2015 02:18 AM, Robert Haas wrote:

On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I think LEAKPROOF is probably fine for this. How would the new thing
be different?

I don't think so - AFAIK "leakproof" is about not revealing information
about arguments, nothing more and nothing less. It does not say whether it's
safe to evaluate on indexes, and I don't think we should extend the meaning
in this direction.

That seems like a non-answer answer.

I'm not claiming I have an answer, really. My knowledge of leakproof
stuff is a bit shallow. Also, I had a few beers earlier today, which
does not really improve the depth of knowledge on any topic except
politics and football (aka soccer). So you may be perfectly right.

Clearly, if a function can leak information about it's arguments,
for example by throwing an error, we can't call it on tuples that
might not even be visible, or the behavior of the query might be
change. So any function that is not leakproof is also not safe for
this purpose.

Now that doesn't rule out the possibility that the functions for
which this optimization is safe are a subset of the leakproof
functions - but offhand I don't see why that should be the case. The
index tuple is just a tuple, and the values it contains are just
datums, no more or less than in a heap tuple. There could be a reason
for concern here, but saying that it might not be "safe to evaluate
on indexes" just begs the question: WHY wouldn't that be safe?

Ah. For some reason I thought the "division by zero" example is one of
the non-safe cases, but now I see int4div is not marked as leakproof, so
we couldn't push it to index anyway.

I've however also noticed that all the 'like' procedures are marked as
not leak proof, which is a bit unfortunate because that's the example
from Jeff's e-mail that started this thread.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Kyotaro HORIGUCHI (#18)

Re: Bitmap index scans use of filters on available columns

On Fri, Nov 6, 2015 at 10:28 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Fri, 6 Nov 2015 09:49:30 +0530, Amit Kapila <amit.kapila16@gmail.com>

wrote in <CAA4eK1L_5T-UYsQeMOrX54g3QeXGhhAk5YFmzZqu5MidzxzkRg@mail.gmail.com

Apart from other problems discussed, I think it could also lead to
a performance penality for the cases when the qual condition is
costly as evaluating such a qual against lot of dead rows would be a
time consuming operation. I am not sure, but I think some of the
other well know databases might not have any such problems as
they store visibility info inside index due to which they don't need to
fetch the heap tuple for evaluating such quals.

I was junst thinking of the same thing. Can we estimate the
degree of the expected penalty using heap statistics? Of couse
not so accurate though.

I think so. Information related to number of tuples inserted, updated,
hot-updated, deleted and so on is maintained and is shown by
pg_stat_all_tables. By the way, whats your point here, do you mean to
say that we can estimate approximate penality and then decide whether
quals should be evaluated at index level?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#23

Jeff Janes

jeff.janes@gmail.com

about 10 years ago

In reply to: Tomas Vondra (#21)

Re: Bitmap index scans use of filters on available columns

On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 11/07/2015 02:18 AM, Robert Haas wrote:

On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I think LEAKPROOF is probably fine for this. How would the new thing
be different?

I don't think so - AFAIK "leakproof" is about not revealing information
about arguments, nothing more and nothing less. It does not say whether
it's
safe to evaluate on indexes, and I don't think we should extend the
meaning
in this direction.

That seems like a non-answer answer.

I'm not claiming I have an answer, really. My knowledge of leakproof stuff
is a bit shallow. Also, I had a few beers earlier today, which does not
really improve the depth of knowledge on any topic except politics and
football (aka soccer). So you may be perfectly right.

Clearly, if a function can leak information about it's arguments,
for example by throwing an error, we can't call it on tuples that
might not even be visible, or the behavior of the query might be
change. So any function that is not leakproof is also not safe for
this purpose.

Now that doesn't rule out the possibility that the functions for
which this optimization is safe are a subset of the leakproof
functions - but offhand I don't see why that should be the case. The
index tuple is just a tuple, and the values it contains are just
datums, no more or less than in a heap tuple. There could be a reason
for concern here, but saying that it might not be "safe to evaluate
on indexes" just begs the question: WHY wouldn't that be safe?

Ah. For some reason I thought the "division by zero" example is one of the
non-safe cases, but now I see int4div is not marked as leakproof, so we
couldn't push it to index anyway.

I've however also noticed that all the 'like' procedures are marked as not
leak proof, which is a bit unfortunate because that's the example from
Jeff's e-mail that started this thread.

Is there a reason they aren't leak proof? I don't see that they have
side effects, or that they can throw errors. What do they leak?

Anyway, I just picked that for convenience. One of the original
pieces that lead me on this quest had a range inclusion rather than a
LIKE. I just thought that changing it to a LIKE made the data
generator simpler and easier to reason about, without changing the
fundamental principle.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Jeff Janes (#23)

Re: Bitmap index scans use of filters on available columns

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I've however also noticed that all the 'like' procedures are marked as not
leak proof, which is a bit unfortunate because that's the example from
Jeff's e-mail that started this thread.

Is there a reason they aren't leak proof? I don't see that they have
side effects, or that they can throw errors. What do they leak?

Huh?

regression=# select 'z' like '\';
ERROR: LIKE pattern must not end with escape character

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Jeff Janes

jeff.janes@gmail.com

about 10 years ago

In reply to: Tom Lane (#24)

Re: Bitmap index scans use of filters on available columns

On Sun, Nov 8, 2015 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I've however also noticed that all the 'like' procedures are marked as not
leak proof, which is a bit unfortunate because that's the example from
Jeff's e-mail that started this thread.

Is there a reason they aren't leak proof? I don't see that they have
side effects, or that they can throw errors. What do they leak?

Huh?

regression=# select 'z' like '\';
ERROR: LIKE pattern must not end with escape character

Ah, I was only thinking of the pattern as a constant. And there is no
way to declare a function leakproof on one input but not another.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers