Indexam interface proposal

Started by Heikki Linnakangasalmost 19 years ago11 messages
#1Heikki Linnakangas
heikki@enterprisedb.com

Currently amgettuple returns one matching tuple at a time, in index
order. I'm proposing two changes to add support for
- candidate matches
- partial ordering

Those two features are essential to make clustered indexes work, and in
the future, binned bitmap indexes that don't have a bitmap for each
distinct value but for ranges of values.

There's a third feature looming in the future, that I haven't addressed:
- returning index values, for index-only scans or at least for filtering
rows before fetching heap tuples.

I'm proposing that we keep the one tuple per call nature of the
interface, but add a flag to mark candidate matches. index_getnext or
the executor would need to recheck the original quals for tuples marked
as candidates.

Another flag would be used to mean "this tuple marks the boundary of a
partial ordering". Let's call it boundary_mark for now.

For example, if an index scan returned tuples with the following keys,
with tuples on same line meaning the index doesn't know their relative
ordering.

1
3 4 2
5 8 6 7
9
10

amgettuple would return the above tuples like this:

1 3 4 2 5 8 6 7 9 10
* * * * *

where the tuples marked with * would have the boundary_mark-flag set. If
the plan requires ordered results, index_getnext would have to sort
tuples between two markers before returning them to the caller.

amgetmulti would also need to have the candidate-flag added as I
proposed in the "Bitmapindexscan changes" patch I sent earlier to
pgsql-patches.

This interface change would solve much of the ugliness of my GIT patch,
by generalizing the index quals checking and sorting code to index_getnext.

Another source of ugliness in the patch is in inserting new tuples.
Inserts need to reach to the heap to fetch heap tuples, to compare keys
when splitting a group. I don't see a clean fix for that, but I don't
think it's as bad as the index scan code.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Heikki Linnakangas (#1)
Re: Indexam interface proposal

On Mon, Mar 19, 2007 at 12:23:01PM +0000, Heikki Linnakangas wrote:

Currently amgettuple returns one matching tuple at a time, in index
order. I'm proposing two changes to add support for
- candidate matches

IIRC indexes can already ask to have the system recheck conditions on
returned tuples. For example GiST can return more tuples than actually
match. That's what the amopreqcheck column is for in pg_amop.

Or am I misunderstanding?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#3Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Martijn van Oosterhout (#2)
Re: Indexam interface proposal

Martijn van Oosterhout wrote:

On Mon, Mar 19, 2007 at 12:23:01PM +0000, Heikki Linnakangas wrote:

Currently amgettuple returns one matching tuple at a time, in index
order. I'm proposing two changes to add support for
- candidate matches

IIRC indexes can already ask to have the system recheck conditions on
returned tuples. For example GiST can return more tuples than actually
match. That's what the amopreqcheck column is for in pg_amop.

Right, except that flag is per operator in operator class, and what I'm
proposing is that the index could pass a flag per tuple in the scan.
Some tuples in the scan might need rechecking, some might not. The need
for rechecking in clustered indexes is orthogonal to the need arising
from the lossyness of GiST operators.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Teodor Sigaev
teodor@sigaev.ru
In reply to: Heikki Linnakangas (#3)
Re: Indexam interface proposal

Right, except that flag is per operator in operator class, and what I'm
proposing is that the index could pass a flag per tuple in the scan.

That might make sense even for GiST. Sometimes complex compressions is used in
GiST opclasses. If indexing value is rather small then it's stored in index as
is, but large value is compressed with lossy techniques. So, GiST might return a
tuple which is allowed to not recheck.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#5Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Teodor Sigaev (#4)
Re: Indexam interface proposal

Teodor Sigaev wrote:

Right, except that flag is per operator in operator class, and what
I'm proposing is that the index could pass a flag per tuple in the scan.

That might make sense even for GiST. Sometimes complex compressions is
used in GiST opclasses. If indexing value is rather small then it's
stored in index as is, but large value is compressed with lossy
techniques. So, GiST might return a tuple which is allowed to not recheck.

Interesting. So we'd add a flag to the index tuples in GiST indicating
if the tuple is lossily compressed or not. The compress-function would
set that flag when it performs lossy compression, and gistgettuple would
return it to the caller.

That would completely replace the current RECHECK-option we have, right?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Martijn van Oosterhout
kleptog@svana.org
In reply to: Teodor Sigaev (#4)
Re: Indexam interface proposal

On Mon, Mar 19, 2007 at 04:40:52PM +0300, Teodor Sigaev wrote:

Right, except that flag is per operator in operator class, and what I'm
proposing is that the index could pass a flag per tuple in the scan.

That might make sense even for GiST. Sometimes complex compressions is used
in GiST opclasses. If indexing value is rather small then it's stored in
index as is, but large value is compressed with lossy techniques. So, GiST
might return a tuple which is allowed to not recheck.

Given that rechecking requires Expr and state structures, maybe it would
be easier to make the operators RECHECK so the planner does the right
thing now, but make a flag that tells the index scan *not* to recheck
this tuple. That would seem slightly less work and fit better with the
existing code. (In other words, it's an optimisation rather than a big
change).

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#3)
Re: Indexam interface proposal

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Martijn van Oosterhout wrote:

IIRC indexes can already ask to have the system recheck conditions on
returned tuples. For example GiST can return more tuples than actually
match. That's what the amopreqcheck column is for in pg_amop.

Right, except that flag is per operator in operator class, and what I'm
proposing is that the index could pass a flag per tuple in the scan.

The reason for attaching the flag to operators is so that the system
(particularly the planner) can tell *which* conditions need to be
rechecked, and can prepare the necessary expression infrastructure.
I dislike the idea of having to be prepared to do that every time
for every indexscan. The notion of having to be prepared to sort
(according to what?) is even worse.

regards, tom lane

#8Teodor Sigaev
teodor@sigaev.ru
In reply to: Heikki Linnakangas (#5)
Re: Indexam interface proposal

Interesting. So we'd add a flag to the index tuples in GiST indicating
if the tuple is lossily compressed or not. The compress-function would
set that flag when it performs lossy compression, and gistgettuple would
return it to the caller.

Keys in GiST index may be another type than column on which index was created,
so gistgettuple can not return tuple in original form - but sometimes
gistgettuple may be sure that recheck isn't needed.

That would completely replace the current RECHECK-option we have, right?

Yeah, this is possible.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#9Teodor Sigaev
teodor@sigaev.ru
In reply to: Martijn van Oosterhout (#6)
Re: Indexam interface proposal

Given that rechecking requires Expr and state structures, maybe it would
be easier to make the operators RECHECK so the planner does the right
thing now, but make a flag that tells the index scan *not* to recheck
this tuple. That would seem slightly less work and fit better with the
existing code. (In other words, it's an optimisation rather than a big
change).

I like your suggestion - it's useful for GiST/GIN fulltext indexing.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#10Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Tom Lane (#7)
Re: Indexam interface proposal

Tom Lane wrote:

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Martijn van Oosterhout wrote:

IIRC indexes can already ask to have the system recheck conditions on
returned tuples. For example GiST can return more tuples than actually
match. That's what the amopreqcheck column is for in pg_amop.

Right, except that flag is per operator in operator class, and what I'm
proposing is that the index could pass a flag per tuple in the scan.

The reason for attaching the flag to operators is so that the system
(particularly the planner) can tell *which* conditions need to be
rechecked, and can prepare the necessary expression infrastructure.
I dislike the idea of having to be prepared to do that every time
for every indexscan.

I don't see any big down-side in preparing for that. We'd need to always
store the original index quals in the executor node, like we do now with
recheck-flagged operators, but that doesn't seem too bad to me.

I suppose we would want to keep the existing per-operator recheck-flag
and quals as it is, and add another field like indexqualorig to be used
to recheck tuples amgetnext flags as candidates.

The notion of having to be prepared to sort
(according to what?) is even worse.

That we wouldn't need for clustered indexes, if we change the current
design a bit. Either:
* store a sorted list of offsetnumbers for each group, instead of a bitmap,
* or store a bitmap like now, but require that heap tuples in a grouped
index tuple are in cluster order within the heap page.

The first option eats away some of the space savings, the second option
makes clustered indexes to become declustered quicker if there's
out-of-order updates or inserts.

Choosing either option would also reduce the CPU overhead of index
scans, because we could use binary search within a grouped index tuple.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#11Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#1)
Re: Indexam interface proposal

Added to TODO:

* Allow index scans to return matching index keys

http://archives.postgresql.org/pgsql-hackers/2007-03/msg01079.php

---------------------------------------------------------------------------

Heikki Linnakangas wrote:

Currently amgettuple returns one matching tuple at a time, in index
order. I'm proposing two changes to add support for
- candidate matches
- partial ordering

Those two features are essential to make clustered indexes work, and in
the future, binned bitmap indexes that don't have a bitmap for each
distinct value but for ranges of values.

There's a third feature looming in the future, that I haven't addressed:
- returning index values, for index-only scans or at least for filtering
rows before fetching heap tuples.

I'm proposing that we keep the one tuple per call nature of the
interface, but add a flag to mark candidate matches. index_getnext or
the executor would need to recheck the original quals for tuples marked
as candidates.

Another flag would be used to mean "this tuple marks the boundary of a
partial ordering". Let's call it boundary_mark for now.

For example, if an index scan returned tuples with the following keys,
with tuples on same line meaning the index doesn't know their relative
ordering.

1
3 4 2
5 8 6 7
9
10

amgettuple would return the above tuples like this:

1 3 4 2 5 8 6 7 9 10
* * * * *

where the tuples marked with * would have the boundary_mark-flag set. If
the plan requires ordered results, index_getnext would have to sort
tuples between two markers before returning them to the caller.

amgetmulti would also need to have the candidate-flag added as I
proposed in the "Bitmapindexscan changes" patch I sent earlier to
pgsql-patches.

This interface change would solve much of the ugliness of my GIT patch,
by generalizing the index quals checking and sorting code to index_getnext.

Another source of ugliness in the patch is in inserting new tuples.
Inserts need to reach to the heap to fetch heap tuples, to compare keys
when splitting a group. I don't see a clean fix for that, but I don't
think it's as bad as the index scan code.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +