WIP: Fast GiST index build
Hackers,
WIP patch of fast GiST index build is attached. Code is dirty and comments
are lacking, but it works. Now it is ready for first benchmarks, which
should prove efficiency of selected technique. It's time to compare fast
GiST index build with repeat insert build on large enough datasets (datasets
which don't fit to cache). There are following aims of testing:
1) Measure acceleration of index build.
2) Measure change in index quality.
I'm going to do first testing using synthetic datasets. Everybody who have
interesting real-life datasets for testing are welcome.
------
With best regards,
Alexander Korotkov.
Attachments:
gist_fast_build-0.0.3.patch.gzapplication/x-gzip; name=gist_fast_build-0.0.3.patch.gzDownload+1-0
On 03.06.2011 14:02, Alexander Korotkov wrote:
Hackers,
WIP patch of fast GiST index build is attached. Code is dirty and comments
are lacking, but it works. Now it is ready for first benchmarks, which
should prove efficiency of selected technique. It's time to compare fast
GiST index build with repeat insert build on large enough datasets (datasets
which don't fit to cache). There are following aims of testing:
1) Measure acceleration of index build.
2) Measure change in index quality.
I'm going to do first testing using synthetic datasets. Everybody who have
interesting real-life datasets for testing are welcome.
I did some quick performance testing of this. I installed postgis 1.5,
and loaded an extract of the OpenStreetMap data covering Finland. The
biggest gist index in that data set is the idx_nodes_geom index on nodes
table. I have maintenance_work_mem and shared_buffers both set to 512
MB, and this laptop has 4GB of RAM.
Without the patch, reindexing the index takes about 170 seconds and the
index size is 321 MB. And with the patch, it takes about 150 seconds,
and the resulting index size is 319 MB.
The nodes table is 618MB in size, so it fits in RAM. I presume the gain
would be bigger if it doesn't, as the random I/O to update the index
starts to hurt more. But this shows that even when it does, this patch
helps a little bit, and the resulting index size is comparable.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 03.06.2011 14:02, Alexander Korotkov wrote:
Hackers,
WIP patch of fast GiST index build is attached. Code is dirty and comments
are lacking, but it works. Now it is ready for first benchmarks, which
should prove efficiency of selected technique. It's time to compare fast
GiST index build with repeat insert build on large enough datasets (datasets
which don't fit to cache). There are following aims of testing:
1) Measure acceleration of index build.
2) Measure change in index quality.
I'm going to do first testing using synthetic datasets. Everybody who have
interesting real-life datasets for testing are welcome.
I ran another test with a simple table generated with:
CREATE TABLE pointtest (p point);
INSERT INTO pointtest SELECT point(random(), random()) FROM
generate_series(1,50000000);
Generating a gist index with:
CREATE INDEX i_pointtest ON pointtest USING gist (p);
took about 15 hours without the patch, and 2 hours with it. That's quite
dramatic.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 06.06.2011 10:42, Heikki Linnakangas wrote:
On 03.06.2011 14:02, Alexander Korotkov wrote:
Hackers,
WIP patch of fast GiST index build is attached. Code is dirty and
comments
are lacking, but it works. Now it is ready for first benchmarks, which
should prove efficiency of selected technique. It's time to compare fast
GiST index build with repeat insert build on large enough datasets
(datasets
which don't fit to cache). There are following aims of testing:
1) Measure acceleration of index build.
2) Measure change in index quality.
I'm going to do first testing using synthetic datasets. Everybody who
have
interesting real-life datasets for testing are welcome.I ran another test with a simple table generated with:
CREATE TABLE pointtest (p point);
INSERT INTO pointtest SELECT point(random(), random()) FROM
generate_series(1,50000000);Generating a gist index with:
CREATE INDEX i_pointtest ON pointtest USING gist (p);
took about 15 hours without the patch, and 2 hours with it. That's quite
dramatic.
Oops, that was a rounding error, sorry. The run took about 2.7 hours
with the patch, which of course should be rounded to 3 hours, not 2.
Anyway, it is still a very impressive improvement.
I'm glad you could get the patch ready for benchmarking this quickly.
Now you just need to get the patch into shape so that it can be
committed. That is always the more time-consuming part, so I'm glad you
have plenty of time left for it.
Could you please create a TODO list on the wiki page, listing all the
missing features, known bugs etc. that will need to be fixed? That'll
make it easier to see how much work there is left. It'll also help
anyone looking at the patch to know which issues are known issues.
Meanwhile, it would still be very valuable if others could test this
with different workloads. And Alexander, it would be good if at some
point you could write some benchmark scripts too, and put them on the
wiki page, just to see what kind of workloads have been taken into
consideration and tested already. Do you think there's some worst-case
data distributions where this algorithm would perform particularly badly?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Hi!
On Mon, Jun 6, 2011 at 2:51 PM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:
On 06.06.2011 10:42, Heikki Linnakangas wrote:
I ran another test with a simple table generated with:
CREATE TABLE pointtest (p point);
INSERT INTO pointtest SELECT point(random(), random()) FROM
generate_series(1,50000000);Generating a gist index with:
CREATE INDEX i_pointtest ON pointtest USING gist (p);
took about 15 hours without the patch, and 2 hours with it. That's quite
dramatic.Oops, that was a rounding error, sorry. The run took about 2.7 hours with
the patch, which of course should be rounded to 3 hours, not 2. Anyway, it
is still a very impressive improvement.
I have similar results on 100 millions of rows: 21.6 hours without patch and
2 hours with patch. But I found a problem: index quality is worse. See
following query plans. There test is relation where index was created in
ordinal way, and test2 is relation where patch was used.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=4391.01..270397.31 rows=100000 width=20)
(actual time=1.257..2.147 rows=838 loops=1)
Recheck Cond: (v <@ '(0.903,0.203),(0.9,0.2)'::box)
Buffers: shared hit=968
-> Bitmap Index Scan on test_idx (cost=0.00..4366.01 rows=100000
width=0) (actual time=1.162..1.162 rows=838 loops=1)
Index Cond: (v <@ '(0.903,0.203),(0.9,0.2)'::box)
Buffers: shared hit=131
Total runtime: 2.214 ms
(7 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=4370.84..270377.13 rows=100000 width=20)
(actual time=5.252..6.056 rows=838 loops=1)
Recheck Cond: (v <@ '(0.903,0.203),(0.9,0.2)'::box)
Buffers: shared hit=1458
-> Bitmap Index Scan on test2_idx (cost=0.00..4345.84 rows=100000
width=0) (actual time=5.155..5.155 rows=838 loops=1)
Index Cond: (v <@ '(0.903,0.203),(0.9,0.2)'::box)
Buffers: shared hit=621
Total runtime: 6.121 ms
(7 rows)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=4391.01..270397.31 rows=100000 width=20)
(actual time=2.148..2.977 rows=850 loops=1)
Recheck Cond: (v <@ '(0.503,0.503),(0.5,0.5)'::box)
Buffers: shared hit=1099
-> Bitmap Index Scan on test_idx (cost=0.00..4366.01 rows=100000
width=0) (actual time=2.052..2.052 rows=850 loops=1)
Index Cond: (v <@ '(0.503,0.503),(0.5,0.5)'::box)
Buffers: shared hit=249
Total runtime: 3.033 ms
(7 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=4370.84..270377.13 rows=100000 width=20)
(actual time=6.806..7.602 rows=850 loops=1)
Recheck Cond: (v <@ '(0.503,0.503),(0.5,0.5)'::box)
Buffers: shared hit=1615
-> Bitmap Index Scan on test2_idx (cost=0.00..4345.84 rows=100000
width=0) (actual time=6.709..6.709 rows=850 loops=1)
Index Cond: (v <@ '(0.503,0.503),(0.5,0.5)'::box)
Buffers: shared hit=773
Total runtime: 7.667 ms
(7 rows)
We can see that index scan requires read of several times more
pages. Original paper denotes such effect. It explains it by the routing
rectangles in less optimal ways. But this effect wasn't so dramatic in tests
provided in the paper. So, I have following thoughts about this problem:
1) Number of pages, which was readed from index is too large even with
ordinal index build. Querying of small area requires read of hundred of
pages. It probbably caused by picksplit implementation. I've version of
picksplit algorithm which seems to be much more efficient. I'll do some
benchmarks with my picksplit algorithm. I hope difference in index quality
will be not so dramatic.
2) I can try to do some enchancements in fast build alogrithms which could
improve tree quality. In original paper Hilbert heuristic was used to achive
even better tree quality than tree which was created in ordinal way. But
since we use GiST we are restricted by it's interface (or we have to create
new interface functions(s), but I like to avoid it). I would like to try to
do some ordering by penalty value in buffer emptying process and buffers
relocation on split.
3) Probably, there is some bug which affects tree quality.
Could you please create a TODO list on the wiki page, listing all the
missing features, known bugs etc. that will need to be fixed? That'll make
it easier to see how much work there is left. It'll also help anyone looking
at the patch to know which issues are known issues.
Sure. I'll create such list on wiki page. I believe that currenlty most
important issue is index quality.
Meanwhile, it would still be very valuable if others could test this with
different workloads. And Alexander, it would be good if at some point you
could write some benchmark scripts too, and put them on the wiki page, just
to see what kind of workloads have been taken into consideration and tested
already. Do you think there's some worst-case data distributions where this
algorithm would perform particularly badly?
I don't expect some bad cases in terms in IO. My most worrying is about
index quality which is strongly related to data distribution.
------
With best regards,
Alexander Korotkov.
On Mon, Jun 6, 2011 at 2:51 PM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:
Do you think there's some worst-case data distributions where this
algorithm would perform particularly badly?
I think there could be some worst-case GiST applications. Just now gist fast
build algorithm invokes more penalty calls than repeatable insert algorithm.
If I succeed then it will invoke even more such calls. So, if penalty
function is very slow then gist fast build will be slover then
repeatable insert.
------
With best regards,
Alexander Korotkov.
On Mon, Jun 6, 2011 at 4:14 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
If I succeed then it will invoke even more such calls.
I meant here that if I succeed in enhancements which improve index quality
then fast build algorithm will invoke even more such calls.
------
With best regards,
Alexander Korotkov.
I've tried index tuples sorting on penalty function before buffer relocation
on split. But it was without any success. Index quality becomes even worse
than without sorting.
The next thing I've tried is buffer relocation between all neighbor buffers.
Results of first tests is much more promising. Number of page accesses
during index scan is similar to those without fast index build. I'm going to
hold on this approach.
test=# create index test_idx on test using gist(v);
NOTICE: Level step = 1, pagesPerBuffer = 406
CREATE INDEX
Time: 10002590,469 ms
test=# select pg_size_pretty(pg_relation_size('test_idx'));
pg_size_pretty
----------------
6939 MB
(1 row)
test=# explain (analyze, buffers) select * from test where v <@
'(0.903,0.203),(0.9,0.2)'::box;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=4366.78..258752.22 rows=100000 width=16)
(actual time=1.412..2.295 rows=897 loops=1)
Recheck Cond: (v <@ '(0.903,0.203),(0.9,0.2)'::box)
Buffers: shared hit=1038
-> Bitmap Index Scan on test_idx (cost=0.00..4341.78 rows=100000
width=0) (actual time=1.311..1.311 rows=897 loops=1)
Index Cond: (v <@ '(0.903,0.203),(0.9,0.2)'::box)
Buffers: shared hit=141
Total runtime: 2.375 ms
(7 rows)
test=# explain (analyze, buffers) select * from test where v <@
'(0.503,0.503),(0.5,0.5)'::box;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=4366.78..258752.22 rows=100000 width=16)
(actual time=2.113..2.972 rows=855 loops=1)
Recheck Cond: (v <@ '(0.503,0.503),(0.5,0.5)'::box)
Buffers: shared hit=1095
-> Bitmap Index Scan on test_idx (cost=0.00..4341.78 rows=100000
width=0) (actual time=2.016..2.016 rows=855 loops=1)
Index Cond: (v <@ '(0.503,0.503),(0.5,0.5)'::box)
Buffers: shared hit=240
Total runtime: 3.043 ms
(7 rows)
------
With best regards,
Alexander Korotkov.
Attachments:
gist_fast_build-0.1.0.patch.gzapplication/x-gzip; name=gist_fast_build-0.1.0.patch.gzDownload
On Wed, Jun 15, 2011 at 11:21 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
I've tried index tuples sorting on penalty function before buffer
relocation on split. But it was without any success. Index quality becomes
even worse than without sorting.
The next thing I've tried is buffer relocation between all neighbor
buffers. Results of first tests is much more promising. Number of page
accesses during index scan is similar to those without fast index build. I'm
going to hold on this approach.test=# create index test_idx on test using gist(v);
NOTICE: Level step = 1, pagesPerBuffer = 406
CREATE INDEX
Time: 10002590,469 ms
I forget to say that build time increases in about 40%, but it is still
faster than ordinal build in about 10 times.
------
With best regards,
Alexander Korotkov.
On 15.06.2011 10:24, Alexander Korotkov wrote:
On Wed, Jun 15, 2011 at 11:21 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:I've tried index tuples sorting on penalty function before buffer
relocation on split. But it was without any success. Index quality becomes
even worse than without sorting.
The next thing I've tried is buffer relocation between all neighbor
buffers. Results of first tests is much more promising. Number of page
accesses during index scan is similar to those without fast index build. I'm
going to hold on this approach.test=# create index test_idx on test using gist(v);
NOTICE: Level step = 1, pagesPerBuffer = 406
CREATE INDEX
Time: 10002590,469 msI forget to say that build time increases in about 40%, but it is still
faster than ordinal build in about 10 times.
Is this relocation mechanism something that can be tuned, for different
tradeoffs between index quality and build time? In any case, it seems
that we're going to need a lot of testing with different data sets to
get a better picture of how this performs. But at least for now, it
looks like this approach is going to be acceptable.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, Jun 15, 2011 at 12:03 PM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:
Is this relocation mechanism something that can be tuned, for different
tradeoffs between index quality and build time?
Yes, it can. I believe that it can be index parameter.
In any case, it seems that we're going to need a lot of testing with
different data sets to get a better picture of how this performs.
Sure. My problem is that I haven't large enough reallife datasets. Picture
of syntetic datasets can be unrepresentative on reallife cases. On smaller
datasets that I have I actually can compare only index quality. Also, tests
with large datasets takes long time especially without fast build. Probably
solution is to limit cache size during testing. It should allow to measure
I/O benefit even on relatively small datasets. But while I don't know now to
do that on Linux.
------
With best regards,
Alexander Korotkov.
Actually, I would like to measure CPU and IO load independently for more
comprehensive benchmarks. Can you advice me some appropriate tools for it?
------
With best regards,
Alexander Korotkov.
My current idea is to measure number of IO accesses by pg_stat_statements
and measure CPU usage by /proc/PID/stat. Any thoughts?
------
With best regards,
Alexander Korotkov.
On Thu, Jun 16, 2011 at 1:33 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
Show quoted text
Actually, I would like to measure CPU and IO load independently for more
comprehensive benchmarks. Can you advice me some appropriate tools for it?------
With best regards,
Alexander Korotkov.
On 16.06.2011 21:13, Alexander Korotkov wrote:
My current idea is to measure number of IO accesses by pg_stat_statements
and measure CPU usage by /proc/PID/stat. Any thoughts?
Actually, you get both of those very easily with:
set log_statement_stats=on
LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
! 0.000990 elapsed 0.000000 user 0.000000 system sec
! [0.000000 user 0.008000 sys total]
! 0/0 [32/0] filesystem blocks in/out
! 0/0 [0/959] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [10/1] voluntary/involuntary context switches
STATEMENT: SELECT generate_series(1,100);
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Oh, actually it's so easy. Thanks.
------
With best regards,
Alexander Korotkov.
On Thu, Jun 16, 2011 at 10:26 PM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:
Show quoted text
On 16.06.2011 21:13, Alexander Korotkov wrote:
My current idea is to measure number of IO accesses by pg_stat_statements
and measure CPU usage by /proc/PID/stat. Any thoughts?Actually, you get both of those very easily with:
set log_statement_stats=on
LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
! 0.000990 elapsed 0.000000 user 0.000000 system sec
! [0.000000 user 0.008000 sys total]
! 0/0 [32/0] filesystem blocks in/out
! 0/0 [0/959] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [10/1] voluntary/involuntary context switches
STATEMENT: SELECT generate_series(1,100);--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
New version of patch. There are some bugfixes, minor refactoring, comments
(unfortunatelly, not all the code is covered by comments yet). Also
"fastbuild" parameter was added to the GiST index. It allows to test index
building with and without fast build without postgres recompile.
------
With best regards,
Alexander Korotkov.
On Thu, Jun 16, 2011 at 10:35 PM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
Show quoted text
Oh, actually it's so easy. Thanks.
------
With best regards,
Alexander Korotkov.On Thu, Jun 16, 2011 at 10:26 PM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:On 16.06.2011 21:13, Alexander Korotkov wrote:
My current idea is to measure number of IO accesses by pg_stat_statements
and measure CPU usage by /proc/PID/stat. Any thoughts?Actually, you get both of those very easily with:
set log_statement_stats=on
LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
! 0.000990 elapsed 0.000000 user 0.000000 system sec
! [0.000000 user 0.008000 sys total]
! 0/0 [32/0] filesystem blocks in/out
! 0/0 [0/959] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [10/1] voluntary/involuntary context switches
STATEMENT: SELECT generate_series(1,100);--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
Hi!
I've created section about testing in project wiki page:
http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011#Testing_results
Do you have any notes about table structure?
As you can see I found that CPU usage might be much higher
with gist_trgm_ops. I believe it's due to relatively expensive penalty
method in that opclass. But, probably index build can be still faster when
index doesn't fit cache even for gist_trgm_ops. Also with that opclass index
quality is slightly worse but the difference is not dramatic.
------
With best regards,
Alexander Korotkov.
On 21.06.2011 13:08, Alexander Korotkov wrote:
I've created section about testing in project wiki page:
http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011#Testing_results
Do you have any notes about table structure?
It would be nice to have links to the datasets and scripts used, so that
others can reproduce the tests.
It's surprising that the search time differs so much between the
point_ops tests with uniformly random data with 100M and 10M rows. Just
to be sure I'm reading it correctly: a small search time is good, right?
You might want to spell that out explicitly.
As you can see I found that CPU usage might be much higher
with gist_trgm_ops.
Yeah, that is a bit worrysome. 6 minutes without the patch and 18
minutes with it.
I believe it's due to relatively expensive penalty
method in that opclass.
Hmm, I wonder if it could be optimized. I did a quick test, creating a
gist_trgm_ops index on a list of English words from
/usr/share/dict/words. oprofile shows that with the patch, 60% of the
CPU time is spent in the makesign() function.
But, probably index build can be still faster when
index doesn't fit cache even for gist_trgm_ops.
Yep.
Also with that opclass index
quality is slightly worse but the difference is not dramatic.
5-10% difference should be acceptable
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Fri, Jun 24, 2011 at 12:40 PM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:
On 21.06.2011 13:08, Alexander Korotkov wrote:
I've created section about testing in project wiki page:
http://wiki.postgresql.org/**wiki/Fast_GiST_index_build_**
GSoC_2011#Testing_results<http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011#Testing_results>
Do you have any notes about table structure?It would be nice to have links to the datasets and scripts used, so that
others can reproduce the tests.
Done.
It's surprising that the search time differs so much between the point_ops
tests with uniformly random data with 100M and 10M rows. Just to be sure I'm
reading it correctly: a small search time is good, right? You might want to
spell that out explicitly.
Yes, you're reading this correctly. Detailed explanation was added to the
wiki page. It's surprising for me too. I need some more insight into causes
of index quality difference.
Now I found some large enough real-life datasets (thanks to Oleg Bartunov)
and I'm performing tests on them.
------
With best regards,
Alexander Korotkov.
On 24.06.2011 11:40, Heikki Linnakangas wrote:
On 21.06.2011 13:08, Alexander Korotkov wrote:
I believe it's due to relatively expensive penalty
method in that opclass.Hmm, I wonder if it could be optimized. I did a quick test, creating a
gist_trgm_ops index on a list of English words from
/usr/share/dict/words. oprofile shows that with the patch, 60% of the
CPU time is spent in the makesign() function.
I couldn't resist looking into this, and came up with the attached
patch. I tested this with:
CREATE TABLE words (word text);
COPY words FROM '/usr/share/dict/words';
CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);
And then ran "REINDEX INDEX i_words" a few times with and without the
patch. Without the patch, reindex takes about 4.7 seconds. With the
patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes
even more important with Alexander's fast GiST build patch, which calls
the penalty function more.
I used the attached showsign-debug.patch to verify that the patched
makesign function produces the same results as the existing code. I
haven't tested the big-endian code, however.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com