GSoC 2011: Fast GiST index build

Started by Alexander Korotkovover 14 years ago10 messages
#1Alexander Korotkov
aekorotkov@gmail.com

Hackers!

I was happy to know that my proposal "Fast GiST index build" was accepted to
GSoC 2011! Thank you very much for support! Especially thanks to Heikki
Linnakangas for becoming my mentor!

The first question that I would like to discuss is the node buffer storage.
During index build each index page (except leaf) should have several pages
of buffer. So my question is where to store buffers and how to operate with
them? It is somewhat similar to GIN fastupdate buffer, but have differences.
At first, we should take care about many buffers instead of only one. At
second, I belive that we shouldn't take care about concurrency so much,
because algorithm assume to perform relatively huge operations in memory
(entries relocation between several buffers). That require locking of whole
of currently operated buffers. I'm going to store buffers separetely from
index itself, because we should free all of them when index is built.

I found some very simple solution about dealing with varlena keys. The
greatest buffer size and minimal level step are achived when key size is
minimal. Thereby, minimal key size is worst case. Since minimal varlena size
is 4 bytes, we can use it in initial calculations. I'm going to hold on this
assumption in first implementation.

----
With best regards,
Alexander Korotkov.

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#1)
Re: GSoC 2011: Fast GiST index build

Congrats on being selected, looking forward to mentor you!

On 25.04.2011 23:09, Alexander Korotkov wrote:

The first question that I would like to discuss is the node buffer storage.
During index build each index page (except leaf) should have several pages
of buffer. So my question is where to store buffers and how to operate with
them? It is somewhat similar to GIN fastupdate buffer, but have differences.
At first, we should take care about many buffers instead of only one. At
second, I belive that we shouldn't take care about concurrency so much,
because algorithm assume to perform relatively huge operations in memory
(entries relocation between several buffers). That require locking of whole
of currently operated buffers. I'm going to store buffers separetely from
index itself, because we should free all of them when index is built.

Just palloc() the buffers in memory, at least in the first phase.
That'll work fine for index creation. Dealing with concurrent searches
and inserts makes it a lot more complicated, it's better to make it work
for the index creation first, and investigate something like the GIN
fastupdate buffers later if you have time left.

I found some very simple solution about dealing with varlena keys. The
greatest buffer size and minimal level step are achived when key size is
minimal. Thereby, minimal key size is worst case. Since minimal varlena size
is 4 bytes, we can use it in initial calculations. I'm going to hold on this
assumption in first implementation.

Ok, good.

The first priority should be to have something that works enough to be
benchmarked. The paper you referred to in the GSoC application [1]http://dx.doi.org/10.1007/s00453-001-0107-6
contained empirical results on the number of I/O operations needed with
the algorithm, but it didn't take operating system cache into account at
all. That makes the empiric results next to worthless; keeping some
stuff in in-memory buffers is obviously going to reduce I/O if you don't
take OS cache into account.

So we're going to need benchmark results that show a benefit, or there's
no point in doing this at all. The sooner we get to benchmarking, even
with a very limited and buggy version of the patch, the better. If the
algorithm described in that paper doesn't give much benefit, you might
have to switch to some other algorithm half-way through the project.
Fortunately there's plenty of R-tree bulk loading algorithms in the
literature, it should be possible to adapt some of them to GiST.

[1]: http://dx.doi.org/10.1007/s00453-001-0107-6

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: GSoC 2011: Fast GiST index build

On Tue, Apr 26, 2011 at 10:46 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:

Just palloc() the buffers in memory, at least in the first phase. That'll
work fine for index creation. Dealing with concurrent searches and inserts
makes it a lot more complicated, it's better to make it work for the index
creation first, and investigate something like the GIN fastupdate buffers
later if you have time left.

Since algorithm is focused to reduce I/O, we should expect best acceleration
in the case when index doesn't fitting to memory. Size of buffers is
comparable to size of whole index. It means that if we can hold buffers in
memory then we mostly can hold whole index in memory. That's why I think we
should have simple on-disk buffers management for first representative
benchmark.

The first priority should be to have something that works enough to be
benchmarked. The paper you referred to in the GSoC application [1] contained
empirical results on the number of I/O operations needed with the algorithm,
but it didn't take operating system cache into account at all. That makes
the empiric results next to worthless; keeping some stuff in in-memory
buffers is obviously going to reduce I/O if you don't take OS cache into
account.

So we're going to need benchmark results that show a benefit, or there's no
point in doing this at all. The sooner we get to benchmarking, even with a
very limited and buggy version of the patch, the better. If the algorithm
described in that paper doesn't give much benefit, you might have to switch
to some other algorithm half-way through the project. Fortunately there's
plenty of R-tree bulk loading algorithms in the literature, it should be
possible to adapt some of them to GiST.

[1] http://dx.doi.org/10.1007/s00453-001-0107-6

Yes, these priority seems very reasonable. We should have first
effectiveness confirmation as soon as possible. I'll hold on this priority.

----
With best regards,
Alexander Korotkov.

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#3)
Re: GSoC 2011: Fast GiST index build

On Tue, Apr 26, 2011 at 1:10 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

Since algorithm is focused to reduce I/O, we should expect best
acceleration in the case when index doesn't fitting to memory. Size of
buffers is comparable to size of whole index. It means that if we can hold
buffers in memory then we mostly can hold whole index in memory. That's why
I think we should have simple on-disk buffers management for first
representative benchmark.

Since we need to free all buffers after index built, I believe that buffers
should be stored separately. If not, index become bloat immediatly after
creation. We probably need to create a temporary relation to store buffers
in it. If my thought is right, then is there any example of using temporary
relation?

----
With best regards,
Alexander Korotkov.

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#4)
Re: GSoC 2011: Fast GiST index build

On 27.04.2011 09:51, Alexander Korotkov wrote:

On Tue, Apr 26, 2011 at 1:10 PM, Alexander Korotkov<aekorotkov@gmail.com>wrote:

Since algorithm is focused to reduce I/O, we should expect best
acceleration in the case when index doesn't fitting to memory. Size of
buffers is comparable to size of whole index. It means that if we can hold
buffers in memory then we mostly can hold whole index in memory. That's why
I think we should have simple on-disk buffers management for first
representative benchmark.

Since we need to free all buffers after index built, I believe that buffers
should be stored separately. If not, index become bloat immediatly after
creation. We probably need to create a temporary relation to store buffers
in it. If my thought is right, then is there any example of using temporary
relation?

A temporary relation is a bit heavy-weight for this, a temporary file
should be enough. See src/backend/storage/file/buffile.c,
BufFileCreateTemp() function in particular. Or perhaps a tuplestore
suits better, see src/backend/utils/sort/tuplestore.c, that's simpler to
use if you're storing tuples. tuplestore only supports storing heap
tuples at the moment, but it could easily be extended to store index
tuples, like tuplesort.c does.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#5)
Re: GSoC 2011: Fast GiST index build

During studying of existing GiST code I have a question.
gistFindCorrectParent function have branch with following comment:
/*
* awful!!, we need search tree to find parent ... , but before we
* should release all old parent
*/
Can you provide me an example of case when this branch works?

----
With best regards,
Alexander Korotkov.

#7Teodor Sigaev
teodor@sigaev.ru
In reply to: Alexander Korotkov (#6)
Re: GSoC 2011: Fast GiST index build

gistFindCorrectParent function have branch with following comment:
/*
* awful!!, we need search tree to find parent ... , but before we
* should release all old parent
*/
Can you provide me an example of case when this branch works?

See several lines above:
if (parent->blkno == InvalidBlockNumber)

/*
* end of chain and still didn't found parent, It's very-very
* rare situation when root splited
*/
break;

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#8Alexander Korotkov
aekorotkov@gmail.com
In reply to: Teodor Sigaev (#7)
Re: GSoC 2011: Fast GiST index build

2011/5/5 Teodor Sigaev <teodor@sigaev.ru>

See several lines above:
if (parent->blkno == InvalidBlockNumber)

/*
* end of chain and still didn't found parent, It's
very-very
* rare situation when root splited
*/
break;

As I understood it's because we can't move root to another page.

----
With best regards,
Alexander Korotkov.

#9Teodor Sigaev
teodor@sigaev.ru
In reply to: Alexander Korotkov (#8)
Re: GSoC 2011: Fast GiST index build

As I understood it's because we can't move root to another page.

Actually not. Path to node could change completely, For example, for page on
second level path is 0-234 (where 0 is a root page, 234 is a page on second
level). After root split path will be 0-new_page-234.

If algorithm could be able to change root then new path could be looked as
new_root-new_page-234 because old root could be splitted to old_root_page and
new_page.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Teodor Sigaev (#9)
Re: GSoC 2011: Fast GiST index build

2011/5/6 Teodor Sigaev <teodor@sigaev.ru>

As I understood it's because we can't move root to another page.

Actually not. Path to node could change completely, For example, for page
on second level path is 0-234 (where 0 is a root page, 234 is a page on
second level). After root split path will be 0-new_page-234.

If algorithm could be able to change root then new path could be looked as
new_root-new_page-234 because old root could be splitted to old_root_page
and new_page.

Ok. Thank you for explanation.

----
With best regards,
Alexander Korotkov.