Covering GiST indexes

Started by Andrey Borodinabout 8 years ago28 messageshackers
Jump to latest
#1Andrey Borodin
amborodin@acm.org

Hi, hackers!

Looks like we finally have covering indexes! And that's cool!

So I decided to create a thread to discuss covering GiST indexes.
Here's a prototype patch implementing this functionality.
It is quite small (+80 -30) and lacks tests and docs. But it creates a context.

I have two concerns.
First one is about INDEX_AM_RESERVED_BIT.
B-tree uses it as a base for prefix truncation (I'm not quite sure why it is usually called suffix truncation, but this is a matter for other thread).
GiST , probably, will not use [pre\su]fix truncation. But I'd like to use that 13th bit to implement intra-page indexing - a way to improve search within gist page. See [0,1]

Second, currently including indexes do not allow same attributes in both keys and include parts.
# create index on x using gist(c) include (c);
ERROR: included columns must not intersect with key columns

But it makes sense for example for geometries like PostGIS. Index keys are truncated to small MBRs while having whole complex geometry right in an index could be handy.

Any feedback will be appreciated.

Best regards, Andrey Borodin.

[0]: /messages/by-id/7780A07B-4D04-41E2-B228-166B41D07EEE@yandex-team.ru
[1]: /messages/by-id/CAJEAwVE0rrr+OBT-P0gDCtXbVDkBBG_WcXwCBK=GHo4fewu3Yg@mail.gmail.com

Attachments:

0001-Covering-Gist.patchapplication/octet-stream; name=0001-Covering-Gist.patch; x-unix-mode=0644Download+81-31
#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Andrey Borodin (#1)
Re: Covering GiST indexes

Interesting work. I don't have a time now to learn deep your patch, so, add it
to next commitfest, pls. First of all I'd like to see more tests in patch, not
only CREATE INDEX.

Andrey Borodin wrote:

Hi, hackers!

Looks like we finally have covering indexes! And that's cool!

So I decided to create a thread to discuss covering GiST indexes.
Here's a prototype patch implementing this functionality.
It is quite small (+80 -30) and lacks tests and docs. But it creates a context.

I have two concerns.
First one is about INDEX_AM_RESERVED_BIT.
B-tree uses it as a base for prefix truncation (I'm not quite sure why it is usually called suffix truncation, but this is a matter for other thread).
GiST , probably, will not use [pre\su]fix truncation. But I'd like to use that 13th bit to implement intra-page indexing - a way to improve search within gist page. See [0,1]

Second, currently including indexes do not allow same attributes in both keys and include parts.
# create index on x using gist(c) include (c);
ERROR: included columns must not intersect with key columns

But it makes sense for example for geometries like PostGIS. Index keys are truncated to small MBRs while having whole complex geometry right in an index could be handy.

Any feedback will be appreciated.

Best regards, Andrey Borodin.

[0] /messages/by-id/7780A07B-4D04-41E2-B228-166B41D07EEE@yandex-team.ru
[1] /messages/by-id/CAJEAwVE0rrr+OBT-P0gDCtXbVDkBBG_WcXwCBK=GHo4fewu3Yg@mail.gmail.com

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#3Aleksander Alekseev
aleksander@timescale.com
In reply to: Teodor Sigaev (#2)
Re: Covering GiST indexes

Hello Andrey,

So I decided to create a thread to discuss covering GiST indexes.
Here's a prototype patch implementing this functionality. It is quite
small (+80 -30) and lacks tests and docs. But it creates a context.

I'm glad you got interested in this area. It would be great to have
covering indexes support for GiST as well. Please don't forget to add
the thread to the nearest commitfest entry. I'm looking forward for the
next versions of your patch!

--
Best regards,
Aleksander Alekseev

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#1)
Re: Covering GiST indexes

Hi, Andrey!

On Thu, Apr 12, 2018 at 2:00 PM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

Looks like we finally have covering indexes! And that's cool!

Thank you!

So I decided to create a thread to discuss covering GiST indexes.

Here's a prototype patch implementing this functionality.
It is quite small (+80 -30) and lacks tests and docs. But it creates a
context.

I have two concerns.
First one is about INDEX_AM_RESERVED_BIT.
B-tree uses it as a base for prefix truncation (I'm not quite sure why it
is usually called suffix truncation, but this is a matter for other thread).
GiST , probably, will not use [pre\su]fix truncation. But I'd like to use
that 13th bit to implement intra-page indexing - a way to improve search
within gist page. See [0,1]

I also think that GiST isn't going to do suffix truncation of keys, because
GiST
is composing keys for every attribute and trying to use them in queries if
even GiST didn't use those attributes in any split. Depending on data
distribution,
key representation, query and so on that keys may appear useful or not.
And GiST doesn't have any legal interface to determine that in advance.

I think that GiST needs another feature: not suffix truncation, but
different
attribute representation in leaf pages and internal pages. For example,
now GiST on points stores boxes in leafs. That's a great waste of space.
So, we might just have different tuple descriptors for internal and leaf
pages of GiST, which could have both different attribute types and
different counts of attributes. Thankfully GiST pages don't have high keys
and we don't have to mix tuples of different types on the same page
(unlike B-tree).

Second, currently including indexes do not allow same attributes in both

keys and include parts.
# create index on x using gist(c) include (c);
ERROR: included columns must not intersect with key columns

But it makes sense for example for geometries like PostGIS. Index keys are
truncated to small MBRs while having whole complex geometry right in an
index could be handy.

The issue discovered in [1] seems to be related. Thus, after fix provided
there when
column is indexed without index-only scan support, then it can't be used in
index-only
scan anyway (if even it's indexed another time with index-only scan
support).
So, I think we need a better fix for [1] first.

Another thing that could be done for PostGIS geometries is just another
opclass which
stores geometries "as is" in leafs. As I know, geometries contain MBRs
inside their
own, so there is no need to store extra MBR. I think the reason why PostGIS
doesn't have such opclass yet is that geometries are frequently large and
easily
can exceed maximum size of index tuple. The same problem applies to
coverting
indexes too. Possible solution here might be to support external toasted
attributes
in covering indexes. But I think that still requires quite amount of work.

1.
/messages/by-id/1516210494.1798.16.camel@nlpgo.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In reply to: Andrey Borodin (#1)
Re: Covering GiST indexes

On Thu, Apr 12, 2018 at 4:00 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

I have two concerns.
First one is about INDEX_AM_RESERVED_BIT.
B-tree uses it as a base for prefix truncation (I'm not quite sure why it is usually called suffix truncation, but this is a matter for other thread).

Since you brought it up, and since I pushed that particular
terminology, I should acknowledge that the original 1977 Bayer paper
on suffix truncation calls a B-Tree with suffix truncation a prefix
B-Tree. However, more recent work seems to consistently refer to the
technique as suffix truncation, while also referring to more advanced
techniques for compressing (not truncating) leaf tuples as prefix
compression.

I suggested suffix truncation because it seemed to be the dominant way
of referring to the technique. And, because it seemed more logical:
the suffix is what gets truncated away.

--
Peter Geoghegan

In reply to: Alexander Korotkov (#4)
Re: Covering GiST indexes

Another thing that could be done for PostGIS geometries is just another
opclass which
stores geometries "as is" in leafs. As I know, geometries contain MBRs
inside their
own, so there is no need to store extra MBR. I think the reason why
PostGIS
doesn't have such opclass yet is that geometries are frequently large and
easily
can exceed maximum size of index tuple.

Geometry datatype layout was designed with TOASTing in mind: most of data
is stored in the header, including type, SRID, box and some other flags, so
getting just several first bytes tells you a lot.

PostGIS datasets are often of a mixed binary length: in buildings, for
example, it is quite common to have a lot of four corner houses, and just
one mapped as a circle, that digitizing software decided to make via
720-point polygon. Since reading it from TOAST for an index would require a
seek of some kind, it may be as efficient to just look it up in the table.

This way a lossy decompress function can help with index only scans: up to
some binary length, try to store the original geometry in the index. After
that, store a shape that has less points in it but covers slightly larger
area, plus a flag that it's not precise. There are a lot of ways to
generate a covering shape with less points: obvious is a box, less obvious
is non axis aligned box, a collection of boxes for a multipart shape, an
outer ring for an area with lots of holes, a convex hull or a smallest
enclosing k-gon.

In GIS there is a problem of border of Russia: the country overlaps over
180 meridian and has a complex border shape. if you take a box of it, it
spans from -180 to 180. If you query any spot in US or in Europe, you'll
have it intersecting with your area, require a full recheck, complete
detoast and decompression, and then "no, it's not a thing we need, skip".
Allowing anything better than a box would help. If we're allowing a complex
shape - we've got the container for it, geometry.

If we're storing geometry in index and original's small, why not allow
complete Index Only Scan on it, and let it skip recheck? :)

Darafei Praliaskouski,
GIS Engineer / Juno Minsk

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Darafei "Komяpa" Praliaskouski (#6)
Re: Covering GiST indexes

On Thu, Apr 12, 2018 at 11:20 PM, Darafei "Komяpa" Praliaskouski <
me@komzpa.net> wrote:

Another thing that could be done for PostGIS geometries is just another

opclass which
stores geometries "as is" in leafs. As I know, geometries contain MBRs
inside their
own, so there is no need to store extra MBR. I think the reason why
PostGIS
doesn't have such opclass yet is that geometries are frequently large and
easily
can exceed maximum size of index tuple.

Geometry datatype layout was designed with TOASTing in mind: most of data
is stored in the header, including type, SRID, box and some other flags, so
getting just several first bytes tells you a lot.

PostGIS datasets are often of a mixed binary length: in buildings, for
example, it is quite common to have a lot of four corner houses, and just
one mapped as a circle, that digitizing software decided to make via
720-point polygon. Since reading it from TOAST for an index would require a
seek of some kind, it may be as efficient to just look it up in the table.

This way a lossy decompress function can help with index only scans: up to
some binary length, try to store the original geometry in the index. After
that, store a shape that has less points in it but covers slightly larger
area, plus a flag that it's not precise. There are a lot of ways to
generate a covering shape with less points: obvious is a box, less obvious
is non axis aligned box, a collection of boxes for a multipart shape, an
outer ring for an area with lots of holes, a convex hull or a smallest
enclosing k-gon.

In GIS there is a problem of border of Russia: the country overlaps over
180 meridian and has a complex border shape. if you take a box of it, it
spans from -180 to 180. If you query any spot in US or in Europe, you'll
have it intersecting with your area, require a full recheck, complete
detoast and decompression, and then "no, it's not a thing we need, skip".
Allowing anything better than a box would help. If we're allowing a complex
shape - we've got the container for it, geometry.

If we're storing geometry in index and original's small, why not allow
complete Index Only Scan on it, and let it skip recheck? :)

So, as I get the idea is that geometries has very different sizes
from very small to very large. And it would be nice to store small
geometries in the index "as is". And then for small geomentries
we can do index only scan and recheck while for large geometries
we have to visit heap for fetching geometry.

That can be done in custom opclass without work in PostgreSQL
core except index only scan. Right now optimizer expects index
to be always capable to return original datum for some column,
or to be never capable to do this. It doesn't allow this decision to
be done in runtime. Allowing this would require patch to PostgreSQL
core. This patch shouldn't be hard at the executor size. But
optimizer part of this patch seems hard, because I don't know
how to estimate fraction of index keys, which can be used to
reconstruct original datums. Probably that would require
GiST compress method to return some statistics...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#8Andrey Borodin
amborodin@acm.org
In reply to: Teodor Sigaev (#2)
Re: Covering GiST indexes

12 апр. 2018 г., в 17:03, Teodor Sigaev <teodor@sigaev.ru> написал(а):

Interesting work. I don't have a time now to learn deep your patch, so, add it to next commitfest, pls.

Thanks! Sure, I'll add it.

First of all I'd like to see more tests in patch, not only CREATE INDEX.

Here's V2, with basic set of tests.
Currently, I'm investigating what to document and more places to tests.

Also, I do not use generic function index_truncate_tuple(), because it deforms and then forms tuple, but GiST usually have already deformed tuple.

Best regards, Andrey Borodin.

Attachments:

0001-Covering-GiST-v2.patchapplication/octet-stream; name=0001-Covering-GiST-v2.patch; x-unix-mode=0644Download+328-41
#9Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#8)
Re: Covering GiST indexes

Hi!

13 апр. 2018 г., в 17:36, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):

Here's V2, with basic set of tests.
Currently, I'm investigating what to document and more places to tests.

Here is v3 version of the patch. I've fixed some comments and added some words to docs.

Best regards, Andrey Borodin.

Attachments:

0001-Covering-GiST-v3.patchapplication/octet-stream; name=0001-Covering-GiST-v3.patch; x-unix-mode=0644Download+334-42
#10Thomas Munro
thomas.munro@gmail.com
In reply to: Andrey Borodin (#9)
Re: Covering GiST indexes

On Fri, Jul 6, 2018 at 5:27 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Here is v3 version of the patch. I've fixed some comments and added some words to docs.

Hi again Andrey,

Cfbot reported the following difference (twice) in the
index_including_gist test:

- ERROR: included columns must not intersect with key columns
+ ERROR: relation "tbl_gist_idx" already exists

Well, both of those errors are valid complaints in this case... maybe
the order of checks changed in master since you wrote the patch?
Probably better to use a different name for the index anyway.

--
Thomas Munro
http://www.enterprisedb.com

#11Andrey Borodin
amborodin@acm.org
In reply to: Thomas Munro (#10)
Re: Covering GiST indexes

Hi, Thomas!

29 июля 2018 г., в 14:28, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):

On Fri, Jul 6, 2018 at 5:27 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Here is v3 version of the patch. I've fixed some comments and added some words to docs.

Hi again Andrey,

Cfbot reported the following difference (twice) in the
index_including_gist test:

- ERROR: included columns must not intersect with key columns
+ ERROR: relation "tbl_gist_idx" already exists

Well, both of those errors are valid complaints in this case... maybe
the order of checks changed in master since you wrote the patch?
Probably better to use a different name for the index anyway.

Thanks! The problem appeared with commit 701fd0b [0]https://github.com/postgres/postgres/commit/701fd0bbc98fe8211d36e96f90753985104cd295 <https://github.com/postgres/postgres/commit/701fd0bbc98fe8211d36e96f90753985104cd295&gt; which dropped validation rules checked in failed test. Here's the patch with fixed tests.

Best regards, Andrey Borodin.

[0]: https://github.com/postgres/postgres/commit/701fd0bbc98fe8211d36e96f90753985104cd295 <https://github.com/postgres/postgres/commit/701fd0bbc98fe8211d36e96f90753985104cd295&gt;

Attachments:

0001-Covering-GiST-v4.patchapplication/octet-stream; name=0001-Covering-GiST-v4.patch; x-unix-mode=0644Download+324-42
#12Thomas Munro
thomas.munro@gmail.com
In reply to: Andrey Borodin (#11)
Re: Covering GiST indexes

On Sun, Jul 29, 2018 at 10:50 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Thanks! The problem appeared with commit 701fd0b [0] which dropped
validation rules checked in failed test. Here's the patch with fixed tests.

Thanks. I received the attachment.

Just as an FYI, something about the way your mail client encoded it
prevented the mail archives from picking it up (according to someone
who knows much more about these things than me, it's buried in the
"multipart/mixed" part, so not visible to the mail archive if it
extracts only the "text/plain" part, or something like that). That
didn't happen on any of your other patches. I have no opinion of
whether that's a bug in the mail archive or an unhelpful mail client
or a non-ideal way to post patches, but you can see here that we don't
have the patch in the usual place:

/messages/by-id/850AE105-F89A-42E4-AD40-FBC6EA5A5A00@yandex-team.ru

That explains why cfbot didn't automatically test your new patch, so
it still show as broken here:
http://cfbot.cputube.org/andrey-borodin.html

--
Thomas Munro
http://www.enterprisedb.com

#13Andrey Borodin
amborodin@acm.org
In reply to: Thomas Munro (#12)
Re: Covering GiST indexes

Thanks, Thomas!

30 июля 2018 г., в 3:58, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):

On Sun, Jul 29, 2018 at 10:50 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Thanks! The problem appeared with commit 701fd0b [0] which dropped
validation rules checked in failed test. Here's the patch with fixed tests.

Thanks. I received the attachment.

Just as an FYI, something about the way your mail client encoded it
prevented the mail archives from picking it up

I'll try to investigate this.

Here's patch one more: another attempt to put it into archives.

Best regards, Andrey Borodin.

Attachments:

0001-Covering-GiST-v4.patchapplication/octet-stream; name=0001-Covering-GiST-v4.patch; x-unix-mode=0644Download+324-42
#14Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andrey Borodin (#13)
Re: Covering GiST indexes

On Mon, Jul 30, 2018 at 7:50 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Thanks, Thomas!

30 июля 2018 г., в 3:58, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):

On Sun, Jul 29, 2018 at 10:50 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Thanks! The problem appeared with commit 701fd0b [0] which dropped
validation rules checked in failed test. Here's the patch with fixed tests.

Thanks. I received the attachment.

Just as an FYI, something about the way your mail client encoded it
prevented the mail archives from picking it up

I'll try to investigate this.

Here's patch one more: another attempt to put it into archives.

Hi,

Looks like this time the patch was picked up correctly, but there are some
conflicts with the current master branch:
http://cfbot.cputube.org/patch_20_1615.log
Could you please rebase it one more time?

#15Andrey Borodin
amborodin@acm.org
In reply to: Dmitry Dolgov (#14)
Re: Covering GiST indexes

Hi, Dmitry!

26 нояб. 2018 г., в 21:20, Dmitry Dolgov <9erthalion6@gmail.com> написал(а):

Looks like this time the patch was picked up correctly, but there are some
conflicts with the current master branch:
http://cfbot.cputube.org/patch_20_1615.log
Could you please rebase it one more time?

Here's rebased version. Thanks!

Best regards, Andrey Borodin.

Attachments:

0001-Covering-GiST-v5.patchapplication/octet-stream; name=0001-Covering-GiST-v5.patch; x-unix-mode=0644Download+324-42
#16Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Andrey Borodin (#15)
Re: Covering GiST indexes

On 11/26/18 5:56 PM, Andrey Borodin wrote:

Here's rebased version. Thanks!

Here is my review.

= General

The features seems useful and an obvious extension of covering B-trees,
and a potentially useful feature together with exclusion constraints.

The patch still applies (with git apply -3), compiles and passes the
test suite.

From some testing it seems like it works as expected.

= Code

* Have some minor style issues like that there should be spaces around
|| (in gistcanreturn()) and ? and : (in gistFormTuple()).

* I do not see any need for adding the new parameter to gistFormTuple.
As far as I can tell isTruncated is always equal to !isleaf.

* The comment below from gistFormTuple() does not look correct. No
allocation is taking place.

/*
* Allocate each included attribute.
*/

* Why do you set a supportCollation for the included columns? As far as
I can tell the collation is only used by the various GiST support
functions. Also in theory we should be able to skip initializing these
array entires, but it is probably for the best that we do.

* I think you should check for the attno first in gistcanreturn() to
take advantage of the short circuiting.

* I am no fan of the tupdesc vs truncTupdesc separation and think that
it is a potential hazard, but I do not have any better suggestion right now.

* There is no test case for exclusion constraints, and I feel since that
is one of the more important uses we should probably have at least one
such test case.

= Minor comments

* I think that "the" should have been kept in the text below.

-        Currently, only the B-tree index access method supports this 
feature.
+        Currently, B-tree and GiST index access methods supports this 
feature.

* I am not a native speaker but I think it should be "use the INCLUDE
clause" in the diff below, and I think it also should be "without any
GiST operator class".

+   A GiST index can be covering, i.e. use <literal>INCLUDE</literal> 
clause.
+   Included columns can have data types without GiST operator class. 
Included
+   attributes will be stored uncompressed.

* I think you should write something like "Included attributes always
support index-only scans." rather than "Included attributes can be
returned." in the comment for gistcanreturn().

* Why the random noise in the diff below? I think using "(c3) INCLUDE
(c4)" for both gist and rtree results in a cleaner patch.

  CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
  CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
-CREATE INDEX on tbl USING gist(c3) INCLUDE (c4);
+CREATE INDEX on tbl USING gist(c4) INCLUDE (c3);
  CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
  CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
  CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
-CREATE INDEX on tbl USING rtree(c1, c2) INCLUDE (c3, c4);
+CREATE INDEX on tbl USING rtree(c4) INCLUDE (c3, c1);
  CREATE INDEX on tbl USING btree(c1, c2) INCLUDE (c3, c4);
#17Andrey Borodin
amborodin@acm.org
In reply to: Andreas Karlsson (#16)
Re: Covering GiST indexes

Thank you very much for reviewing the patch!

28 янв. 2019 г., в 12:15, Andreas Karlsson <andreas@proxel.se> написал(а):

= Code

* Have some minor style issues like that there should be spaces around || (in gistcanreturn()) and ? and : (in gistFormTuple()).

Fixed.

* I do not see any need for adding the new parameter to gistFormTuple. As far as I can tell isTruncated is always equal to !isleaf.

You are right. I've removed isTruncated parameter.

* The comment below from gistFormTuple() does not look correct. No allocation is taking place.

/*
* Allocate each included attribute.
*/

Fixed.

* Why do you set a supportCollation for the included columns? As far as I can tell the collation is only used by the various GiST support functions. Also in theory we should be able to skip initializing these array entires, but it is probably for the best that we do.

Removed supportCollation.

* I think you should check for the attno first in gistcanreturn() to take advantage of the short circuiting.

Done.

* I am no fan of the tupdesc vs truncTupdesc separation and think that it is a potential hazard, but I do not have any better suggestion right now.

B-tree is copying tupdesc every time they truncate tuple. We need tuple truncation a little more often: when we are doing page split, we have to form all page tuples, truncated.
Initially, I've optimized only this case, but this led to prepared tupledesc for truncated tuples.

* There is no test case for exclusion constraints, and I feel since that is one of the more important uses we should probably have at least one such test case.

Actually, I did not understand this test case. Can you elaborate more on this? How included attributes should participate in exclude index? What for?

= Minor comments

* I think that "the" should have been kept in the text below.

-        Currently, only the B-tree index access method supports this feature.
+        Currently, B-tree and GiST index access methods supports this feature.

Fixed.

* I am not a native speaker but I think it should be "use the INCLUDE clause" in the diff below, and I think it also should be "without any GiST operator class".

+   A GiST index can be covering, i.e. use <literal>INCLUDE</literal> clause.
+   Included columns can have data types without GiST operator class. Included
+   attributes will be stored uncompressed.

Fixed.

* I think you should write something like "Included attributes always support index-only scans." rather than "Included attributes can be returned." in the comment for gistcanreturn().

Fixed, but slightly reworded.

* Why the random noise in the diff below? I think using "(c3) INCLUDE (c4)" for both gist and rtree results in a cleaner patch.

I've used columns with and without opclass in INCLUDE. This led to these seemingly random changes.

PFA v6. Thanks for reviewing!

Best regards, Andrey Borodin.

Attachments:

0001-Covering-GiST-v6.patchapplication/octet-stream; name=0001-Covering-GiST-v6.patch; x-unix-mode=0644Download+315-36
#18Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Andrey Borodin (#17)
Re: Covering GiST indexes

On 1/28/19 7:26 PM, Andrey Borodin wrote:

* I am no fan of the tupdesc vs truncTupdesc separation and think that it is a potential hazard, but I do not have any better suggestion right now.

B-tree is copying tupdesc every time they truncate tuple. We need tuple truncation a little more often: when we are doing page split, we have to form all page tuples, truncated.
Initially, I've optimized only this case, but this led to prepared tupledesc for truncated tuples.

* There is no test case for exclusion constraints, and I feel since that is one of the more important uses we should probably have at least one such test case.

Actually, I did not understand this test case. Can you elaborate more on this? How included attributes should participate in exclude index? What for?

I mean include a table like below among the tests. I feel like this is a
main use case for INCLUDE.

CREATE TABLE t2 (
x int4range,
y int,

EXCLUDE USING gist (x WITH &&) INCLUDE (y)
);

* Why the random noise in the diff below? I think using "(c3) INCLUDE (c4)" for both gist and rtree results in a cleaner patch.

I've used columns with and without opclass in INCLUDE. This led to these seemingly random changes.

I mean the diff would be smaller as the below. It also may make sense to
make both lines "(c3) INCLUDE (c1, c4)".

  CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
  CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
  CREATE INDEX on tbl USING gist(c3) INCLUDE (c4);
  CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
  CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
  CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
-CREATE INDEX on tbl USING rtree(c1, c2) INCLUDE (c3, c4);
+CREATE INDEX on tbl USING rtree(c3) INCLUDE (c4);
  CREATE INDEX on tbl USING btree(c1, c2) INCLUDE (c3, c4);

Andreas

#19Andrey Borodin
amborodin@acm.org
In reply to: Andreas Karlsson (#18)
Re: Covering GiST indexes

29 янв. 2019 г., в 7:32, Andreas Karlsson <andreas@proxel.se> написал(а):

On 1/28/19 7:26 PM, Andrey Borodin wrote:

* I am no fan of the tupdesc vs truncTupdesc separation and think that it is a potential hazard, but I do not have any better suggestion right now.

B-tree is copying tupdesc every time they truncate tuple. We need tuple truncation a little more often: when we are doing page split, we have to form all page tuples, truncated.
Initially, I've optimized only this case, but this led to prepared tupledesc for truncated tuples.

* There is no test case for exclusion constraints, and I feel since that is one of the more important uses we should probably have at least one such test case.

Actually, I did not understand this test case. Can you elaborate more on this? How included attributes should participate in exclude index? What for?

I mean include a table like below among the tests. I feel like this is a main use case for INCLUDE.

CREATE TABLE t2 (
x int4range,
y int,

EXCLUDE USING gist (x WITH &&) INCLUDE (y)
);

Thanks for the explanation. Added this as case 6 to index_including_gist.

* Why the random noise in the diff below? I think using "(c3) INCLUDE (c4)" for both gist and rtree results in a cleaner patch.

I've used columns with and without opclass in INCLUDE. This led to these seemingly random changes.

I mean the diff would be smaller as the below. It also may make sense to make both lines "(c3) INCLUDE (c1, c4)".

CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
CREATE INDEX on tbl USING gist(c3) INCLUDE (c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
-CREATE INDEX on tbl USING rtree(c1, c2) INCLUDE (c3, c4);
+CREATE INDEX on tbl USING rtree(c3) INCLUDE (c4);
CREATE INDEX on tbl USING btree(c1, c2) INCLUDE (c3, c4);

I've took your version of this test and added all variations of included attributes.

PFA v7.

Thanks!

Best regards, Andrey Borodin.

Attachments:

0001-Covering-GiST-v7.patchapplication/octet-stream; name=0001-Covering-GiST-v7.patch; x-unix-mode=0644Download+357-32
#20Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Andrey Borodin (#19)
Re: Covering GiST indexes

Thanks for the new version of the patch. Based on my knowledge of PG
this is starting to look good, and I have only three small comments below.

I am not 100% a fan of truncTupdesc, but as long as it is well commented
I think that it is fine.

= Review

* I think it is worth writing a short comment when you create
truncTupdesc about why this is done.

* Very minor thing: the diff below is pointless churn on a line not
touched by the patch.

-                        values, isnull, true /* size is currently bogus 
*/ );
+                        values, isnull, true /* size is currently bogus 
*/);

* Another very minor thing: The diff below from gistFormTuple() should
probably be consistent about brackets.

+           if (isnull[i])
+               compatt[i] = (Datum) 0;
+           else
+           {
+               compatt[i] = attdata[i];
+           }

Andreas

#21Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Andreas Karlsson (#20)
#22Andrey Borodin
amborodin@acm.org
In reply to: Andreas Karlsson (#20)
#23Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Andrey Borodin (#22)
#24Andrey Borodin
amborodin@acm.org
In reply to: Andreas Karlsson (#23)
#25Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Andrey Borodin (#24)
#26Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andreas Karlsson (#25)
#27Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#26)
#28Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#27)