GiST "choose subtree" support function to inline penalty

Started by Darafei "Komяpa" Praliaskouskiover 6 years ago5 messages

Hi,

I'm looking at PostGIS geometry GiST index build times and try to optimize
withing the current GiST framework. The function that shows a lot on my
flame graphs is penalty.

I spent weekend rewriting PostGIS penalty to be as fast as possible.
(FYI https://github.com/postgis/postgis/pull/425/files)

However I cannot get any meaningfully faster build time. Even when I strip
it to "just return edge extension" index build time is the same.

Is there a way to inline the penalty into above "choose subtree" loop
somehow? That would also let us stop bit-fiddling floats to simulate a more
complex choosing scheme.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

#2Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Darafei "Komяpa" Praliaskouski (#1)
Re: GiST "choose subtree" support function to inline penalty

Hi!

24 июня 2019 г., в 15:08, Darafei Komяpa Praliaskouski <me@komzpa.net> написал(а):

I'm looking at PostGIS geometry GiST index build times and try to optimize withing the current GiST framework. The function that shows a lot on my flame graphs is penalty.

I spent weekend rewriting PostGIS penalty to be as fast as possible.
(FYI https://github.com/postgis/postgis/pull/425/files)

However I cannot get any meaningfully faster build time. Even when I strip it to "just return edge extension" index build time is the same.

Is there a way to inline the penalty into above "choose subtree" loop somehow? That would also let us stop bit-fiddling floats to simulate a more complex choosing scheme.

Maybe we could just add new opclass function for choosing subtree?
I've created GSoC item for this[0]https://wiki.postgresql.org/wiki/GSoC_2019#GiST_API_advancement_.282019.29.

Best regards, Andrey Borodin.

[0]: https://wiki.postgresql.org/wiki/GSoC_2019#GiST_API_advancement_.282019.29

#3Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Andrey Borodin (#2)
Re: GiST "choose subtree" support function to inline penalty

On Mon, Jun 24, 2019 at 2:31 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

24 июня 2019 г., в 15:08, Darafei Komяpa Praliaskouski <me@komzpa.net>

написал(а):

I'm looking at PostGIS geometry GiST index build times and try to

optimize withing the current GiST framework. The function that shows a lot
on my flame graphs is penalty.

I spent weekend rewriting PostGIS penalty to be as fast as possible.
(FYI https://github.com/postgis/postgis/pull/425/files)

However I cannot get any meaningfully faster build time. Even when I

strip it to "just return edge extension" index build time is the same.

Is there a way to inline the penalty into above "choose subtree" loop

somehow? That would also let us stop bit-fiddling floats to simulate a more
complex choosing scheme.

Maybe we could just add new opclass function for choosing subtree?

+1,
This sounds reasonable. Authors of existing GiST opclasses wouldn't have
trouble to keep compatible with new PostgreSQL versions.

I see one more use case for "choose subtree" instead "penalty". When
R*-tree chooses subtree, it considers to only extension of selected
bounding box, but also overlap increase of bounding boxes. This strategy
should have a positive effect of tree quality, besides I never seen it has
been measured separately. It probably kind of possible to implement using
"penalty" method assuming you have reference to the page in GISTENTRY. But
that doesn't seems a correct way to use the GiST interface. Additionally,
you don't know the attribute number to get the correct key in multicolumn
indexes. Having "choose subtree" method will make it possible to implement
this strategy in correct way. However, this use case is kind of opposite
to Darafei's one, because it should make choosing subtree slower (but
better).

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Darafei "Komяpa" Praliaskouski (#1)
Re: GiST "choose subtree" support function to inline penalty

=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net> writes:

I'm looking at PostGIS geometry GiST index build times and try to optimize
withing the current GiST framework. The function that shows a lot on my
flame graphs is penalty.

I spent weekend rewriting PostGIS penalty to be as fast as possible.
(FYI https://github.com/postgis/postgis/pull/425/files)

However I cannot get any meaningfully faster build time. Even when I strip
it to "just return edge extension" index build time is the same.

TBH this makes me wonder whether the real problem isn't so much "penalty
function is too slow" as "penalty function is resulting in really bad
index splits".

It might be that giving the opclass higher-level control over the split
decision can help with both aspects. But never start micro-optimizing
an algorithm until you're sure it's the right algorithm.

regards, tom lane

In reply to: Tom Lane (#4)
Re: GiST "choose subtree" support function to inline penalty

On Thu, Jun 27, 2019 at 6:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net>
writes:

I'm looking at PostGIS geometry GiST index build times and try to

optimize

withing the current GiST framework. The function that shows a lot on my
flame graphs is penalty.

I spent weekend rewriting PostGIS penalty to be as fast as possible.
(FYI https://github.com/postgis/postgis/pull/425/files)

However I cannot get any meaningfully faster build time. Even when I

strip

it to "just return edge extension" index build time is the same.

TBH this makes me wonder whether the real problem isn't so much "penalty
function is too slow" as "penalty function is resulting in really bad
index splits".

As an extension writer I don't have much control on how Postgres calls
penalty function. PostGIS box is using floats instead of doubles, so its
size is twice as small as postgres builtin box, meaning penalty is called
even more often on better packed pages.

I can get index construction speed to be much faster if I break penalty to
actually result in horrible splits: index size grows 50%, construction is
30% faster.

It might be that giving the opclass higher-level control over the split
decision can help with both aspects.

Please note the question is not about split. Korotkov's split is working
fine. Issue is with penalty and computations required for choosing the
subtree before split happens.

Andrey Borodin proposed off-list that we can provide our own index type
that is a copy of GiST but with penalty inlined into "choose subtree" code
path, as that seems to be the only way to do it in PG12. Is there a more
humane option than forking GiST?

But never start micro-optimizing
an algorithm until you're sure it's the right algorithm.

That's exactly the reason I write original letter. I don't see any option
for further optimization in existing GiST framework, but this optimization
is needed: waiting 10 hours for GiST to build after an hour of ingesting
the dataset is frustrating, especially when you see a nearby b-tree done in
an hour.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa