GIN improvements part 1: additional information
Hackers,
Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.
Resemble GIN interface changes that this patch introduce.
Patch modifies GIN interface as following:
1) Two arguments are added to extractValue
Datum **addInfo, bool **addInfoIsNull
2) Two arguments are added to consistent
Datum addInfo[], bool addInfoIsNull[]
3) New method config is introduced which returns datatype oid of addtional
information (analogy with SP-GiST config method).
Additionally there is another patch which demonstrates benefits from
additional information storage itself (because it don't accelerate tsearch
itselt). It comes in separate thread.
------
With best regards,
Alexander Korotkov.
Attachments:
On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.
New version of patch is attached with some more refactoring and bug fixes.
------
With best regards,
Alexander Korotkov.
Attachments:
On Mon, Jun 17, 2013 at 4:54 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov <aekorotkov@gmail.com
wrote:
Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.New version of patch is attached with some more refactoring and bug fixes.
Another revision with more bug fixes.
------
With best regards,
Alexander Korotkov.
Attachments:
On Wed, Jun 19, 2013 at 12:44 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
On Mon, Jun 17, 2013 at 4:54 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov <
aekorotkov@gmail.com> wrote:Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.New version of patch is attached with some more refactoring and bug fixes.
Another revision with more bug fixes.
New revision of patch is attached. Now it includes some docs.
------
With best regards,
Alexander Korotkov.
Attachments:
On 06/25/2013 12:03 AM, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Hi,
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:
* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever
* gin_private.h:ginDataPageLeafRead() - fetch_att() is used to retrieve
the additional info, but per the definition and comments in tupmacs.h it
expects aligned pointer.
* gindatapage.c:ginCheckPlaceToDataPageLeaf() - comment "if leaf data
page" should probably be "on a leaf data page" or so.
Regards,
Antonin Houska (Tony)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25.06.2013 01:03, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Thanks! I'm looking into this in detail now.
First, this patch actually contains two major things:
1. Pack item pointers more tightly on posting data leaf pages.
2. Allow opclass implementation to attach "additional information" to
each item pointer.
These are two very distinct features, so this patch needs to be split
into two. I extracted the 1st part into a separate patch, attached, and
am going to focus on that now.
I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.
The patch needs a lot of cleanup still, and I may well have broken some
stuff, but I'm quite pleased with the performance. I tested this with
two tables; one is the titles from the DBLP dataset. Another is integer
arrays, created with this:
create function randomintarr() returns int[] as $$ select
array_agg((random() * 1000.0)::int4) from generate_series(1,10) $$
language sql;
create table intarrtbl as select randomintarr() as ii from
generate_series(1, 10000000);
The effect on the index sizes is quite dramatic:
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size |
--------+--------------------+-------+--------+-------------+--------+
public | gin_intarr_master | index | heikki | intarrtbl | 585 MB |
public | gin_intarr_patched | index | heikki | intarrtbl | 211 MB |
public | gin_title | index | heikki | dblp_titles | 93 MB |
public | gin_title_master | index | heikki | dblp_titles | 180 MB |
(4 rows)
Tomas Vondra tested the search performance of an earlier version of this
patch: /messages/by-id/50BFF89A.7080908@fuzzy.cz).
He initially saw a huge slowdown, but could not reproduce it with a
later version of the patch. I did not see much difference in a few quick
queries I ran, so we're probably good on that front.
There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with
the patch, so anyone using GIN probably wants to rebuild them anyway,
sooner or later. Still, I'd like to give it a shot.
2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?
3. I'd like to see some performance testing of insertions, deletions,
and vacuum. I suspect that maintaining the 32-entry index might be
fairly expensive, as it's rewritten on every update to a leaf page.
- Heikki
Attachments:
gin-packed-postinglists-1.patchtext/x-diff; name=gin-packed-postinglists-1.patchDownload+1504-874
On 27.06.2013 17:20, Antonin Houska wrote:
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever
Hmm. It won't loop forever, i is counting the number of entries that fit
on the page, while j is used to loop through the items being added.
However, i isn't actually used for anything (and isn't initialized
either), so it's just dead code.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/29/2013 11:00 AM, Heikki Linnakangas wrote:
On 27.06.2013 17:20, Antonin Houska wrote:
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops foreverHmm. It won't loop forever, i is counting the number of entries that
fit on the page, while j is used to loop through the items being
added. However, i isn't actually used for anything (and isn't
initialized either), so it's just dead code.- Heikki
You're right. While thinking about possible meaning of the 'i' I didn't
notice that j++ is in the 'for' construct. Stupid mistake on my side.
Tony
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29.06.2013 11:56, Heikki Linnakangas wrote:
On 25.06.2013 01:03, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Thanks! I'm looking into this in detail now.
First, this patch actually contains two major things:
1. Pack item pointers more tightly on posting data leaf pages.
2. Allow opclass implementation to attach "additional information" to
each item pointer.These are two very distinct features, so this patch needs to be split
into two. I extracted the 1st part into a separate patch, attached, and
am going to focus on that now.I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.
Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29.06.2013 20:08, Heikki Linnakangas wrote:
On 29.06.2013 11:56, Heikki Linnakangas wrote:
I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..
Here's the correct version. I've probably broken things, sorry about that.
I'm going to mark this as "returned with feedback" now. The code still
needs a lot of general cleanup, including comment and README updates.
Also, please take some time to consider the open questions I listed
here: archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com.
- Heikki
Attachments:
gin-packed-postinglists-2.patchtext/x-diff; name=gin-packed-postinglists-2.patchDownload+1393-429
On Sun, Jun 30, 2013 at 2:50 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 29.06.2013 20:08, Heikki Linnakangas wrote:
On 29.06.2013 11:56, Heikki Linnakangas wrote:
I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..Here's the correct version. I've probably broken things, sorry about that.
I'm going to mark this as "returned with feedback" now. The code still
needs a lot of general cleanup, including comment and README updates. Also,
please take some time to consider the open questions I listed here:
archives.postgresql.org/**message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com>
.
Thanks! So, we have a lot of stuff and you give the points for further
work. Could you please verify my plan of work on these patches:
1) Solving questions of archives.postgresql.org/**
message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com>
for
packed postinglists.
2) Extract additional info patch based on packed postinglists.
3) Rewrite interface of fast scan. Do CPU and IO benchmarking.
4) Do IO benchmarking of index ordering.
Cleanup, comments and READMEs are assumed in each item.
------
With best regards,
Alexander Korotkov.
On Thu, Jun 27, 2013 at 6:20 PM, Antonin Houska <antonin.houska@gmail.com>wrote:
On 06/25/2013 12:03 AM, Alexander Korotkov wrote:
New revision of patch is attached. Now it includes some docs.
Hi,
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:* gindatapage.c:**dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever* gin_private.h:**ginDataPageLeafRead() - fetch_att() is used to retrieve
the additional info, but per the definition and comments in tupmacs.h it
expects aligned pointer.* gindatapage.c:**ginCheckPlaceToDataPageLeaf() - comment "if leaf data
page" should probably be "on a leaf data page" or so.
Hi!
Thanks for pointing these.
------
With best regards,
Alexander Korotkov.
On 01.07.2013 13:28, Alexander Korotkov wrote:
Thanks! So, we have a lot of stuff and you give the points for further
work. Could you please verify my plan of work on these patches:
1) Solving questions of archives.postgresql.org/**
message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com>
for
packed postinglists.
2) Extract additional info patch based on packed postinglists.
3) Rewrite interface of fast scan. Do CPU and IO benchmarking.
4) Do IO benchmarking of index ordering.
Cleanup, comments and READMEs are assumed in each item.
Yep, sounds good!
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I've done a fair amount of testing by loading pgsql-general archives
into a database and running a bunch of simple ts queries that use a GIN
index.
I've tested this as well as the two other patches, but as I was able to
get meaningful results only from this patch, I'll post the results here
and info about segfaults and other observed errors to the other threads.
First of all - update the commitfest page whenever you submit a new
patch version, please. I've spent two or three hours testing and
debugging a patches linked from those pages only to find out that there
are newer versions. I should have checked that initially, but let's keep
that updated.
I wan't able to apply the patches to the current head, so I've used
b8fd1a09 (from 17/06) as a base commit.
The following table shows these metrics:
* data load
- how long it took to import ~200k messages from the list archive
- includes a lot of time spent in Python (parsing), checking FKs ...
- so unless this is significantly higher, it's probably OK
* index size
- size of the main GIN index on message body
* 1/2/3-word(s)
- number of queries in the form
SELECT id FROM messages
WHERE body_tsvector @@ plainto_tsquery('english', 'w1 w2')
LIMIT 100
(executed over 60 seconds, and 'per second' speed)
All the scripts are available at https://bitbucket.org/tvondra/archie
Now, the results:
no patches:
data load: 710 s
index size: 545 MB
1 word: 37500 (630/s)
2 words: 49800 (800/s)
3 words: 40000 (660/s)
additional info (ginaddinfo.7.patch):
data load: 693 s
index size: 448 MB
1 word: 135000 (2250/s)
2 words: 85000 (1430/s)
3 words: 54000 ( 900/s)
additional info + fast scan (gin_fast_scan.4.patch):
data load: 720 s
index size: 455 MB
1 word: FAIL
2 words: FAIL
3 words: FAIL
additional info + fast scan + ordering (gin_ordering.4.patch):
data load: FAIL
index size: N/A
1 word: N/A
2 words: N/A
3 words: N/A
So the speedup after adding info into GIN seems very promising, although
I don't quite understand why searching for two words is so much slower.
Also the index size seems to decrease significantly.
After applying 'fast scan' the things started to break down, so I wasn't
able to run the queries and then even the load failed consistently.
I'll post the info into the appropriate threads.
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.
It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version of
patch by introducing incremental updating of that index. Benchmarks will be
posted later today.
------
With best regards,
Alexander Korotkov.
Attachments:
Please fix compiler warnings:
gindatapage.c: In function ‘dataPlaceToPage’:
gindatapage.c:706:24: warning: unused variable ‘pageBackup’ [-Wunused-variable]
gindatapage.c: In function ‘updateItemIndexes’:
gindatapage.c:1182:6: warning: variable ‘totalsize’ set but not used [-Wunused-but-set-variable]
gindatapage.c: In function ‘dataPlaceToPage’:
gindatapage.c:633:28: warning: ‘startI’ may be used uninitialized in this function [-Wuninitialized]
gindatapage.c:617:21: note: ‘startI’ was declared here
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15.09.2013 12:14, Alexander Korotkov wrote:
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version of
patch by introducing incremental updating of that index. Benchmarks will be
posted later today..
Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.
A couple of quick comments after a quick glance (in addition to the above):
The varbyte encoding stuff is a quite isolated piece of functionality.
And potentially useful elsewhere, too. It would be good to separate that
into a separate .c/.h files. I'm thinking of
src/backend/access/gin/packeditemptr.c, which would contain
ginDataPageLeafReadItemPointer, ginDataPageLeafWriteItemPointer and
ginDataPageLeafGetItemPointerSize functions. A new typedef for
varbyte-encoded things would probably be good too, something like
"typedef char *PackedItemPointer". In the new .c file, please also add a
file-header comment explaining the encoding.
The README really needs to be updated. The posting tree page structure
is a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.
It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending
on what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 16, 2013 at 11:43 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 15.09.2013 12:14, Alexander Korotkov wrote:
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner
or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version
of
patch by introducing incremental updating of that index. Benchmarks will
be
posted later today..Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.A couple of quick comments after a quick glance (in addition to the above):
The varbyte encoding stuff is a quite isolated piece of functionality. And
potentially useful elsewhere, too. It would be good to separate that into a
separate .c/.h files. I'm thinking of src/backend/access/gin/**packeditemptr.c,
which would contain ginDataPageLeafReadItemPointer**,
ginDataPageLeafWriteItemPointe**r and ginDataPageLeafGetItemPointerS**ize
functions. A new typedef for varbyte-encoded things would probably be good
too, something like "typedef char *PackedItemPointer". In the new .c file,
please also add a file-header comment explaining the encoding.The README really needs to be updated. The posting tree page structure is
a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending on
what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)
Apparently some bugs breaks index on huge updates independent on
incremental update of the 32-entry. I'm in debugging now. That's why
benchmarks are delayed.
------
With best regards,
Alexander Korotkov.
On Mon, Sep 16, 2013 at 11:43 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 15.09.2013 12:14, Alexander Korotkov wrote:
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:There's a few open questions:
1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner
or
later. Still, I'd like to give it a shot.2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version
of
patch by introducing incremental updating of that index. Benchmarks will
be
posted later today..Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.
Yes
A couple of quick comments after a quick glance (in addition to the above):
The varbyte encoding stuff is a quite isolated piece of functionality. And
potentially useful elsewhere, too. It would be good to separate that into a
separate .c/.h files. I'm thinking of src/backend/access/gin/**packeditemptr.c,
which would contain ginDataPageLeafReadItemPointer**,
ginDataPageLeafWriteItemPointe**r and ginDataPageLeafGetItemPointerS**ize
functions. A new typedef for varbyte-encoded things would probably be good
too, something like "typedef char *PackedItemPointer". In the new .c file,
please also add a file-header comment explaining the encoding.The README really needs to be updated. The posting tree page structure is
a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending on
what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)
Finally, I've debugged index update.
There are benchmark scripts attached which I used for testing. bench.sh is
doing following:
1) Switches ~/postgres to given branch, configures and compiles it
2) Initializes cluster, runs postgres, imports mailing lists archives which
could be downloaded from here:
http://mira.sai.msu.ru/~megera/tmp/pg-mailing/archives-9.1-main.dump.gz
3) Runs index creation measuring taken time and index size.
4) Runs set of index search queries measuring overall taken time and number
of used blocks. Queries was extracted from mailing lists web server logs.
So, they are real-life.
5) Runs huge updates and vacuums measuring overall taken time and final
index size.
6) Rerun set of queries.
The results of master branch:
Time taken by index build, update and search:
event | period
----------------+-----------------
index_build | 00:01:52.154299
index_update | 00:10:42.982413
search_new | 00:26:14.294872
search_updated | 00:27:06.10298
(4 rows)
Numbers of blocks used in search (not very representative, because it's
mostly consumed by heap fetches):
label | blocks_mark
----------------+-------------
search_new | 848156708
search_updated | 885122373
(2 rows)
Size of index newly created and after updates:
label | size
---------------+------------
new | 884514816
after_updates | 1595252736
(2 rows)
The results of packed posting lists branch.
Time taken by index build, update and search:
event | period
----------------+-----------------
index_build | 00:01:54.363988
index_update | 00:08:55.291099
search_new | 00:26:06.262403
search_updated | 00:27:07.501142
(4 rows)
Numbers of blocks used in search:
label | blocks_mark
----------------+-------------
search_new | 847591514
search_updated | 883928608
(2 rows)
Size of index newly created and after updates:
label | size
---------------+-----------
new | 421330944
after_updates | 718921728
(2 rows)
We can see there is no significant slowdown. Updates even becomes faster
probably because of reduced index size.
Now, I'm going to take care about WAL, refactoring and documentation.
------
With best regards,
Alexander Korotkov.
Last version of patch is attached.
WAL is debugged here. How it's actually working.
Also there was some refactoring, commenting and README update.
Benchmark scripts are also updated. Updated version is attached.
I did some benchmark comparison with different count of index entries. See
results is tables below.
event | master | 16-entries | 32-entries
| 64-entries | 128-entries |
-----------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
index_build | 00:01:50.042658 | 00:01:53.79182 |
00:01:55.647561 | 00:01:52.677095 | 00:01:58.723898 |
index_build_recovery | 00:00:19 | 00:00:06 | 00:00:05
| 00:00:06 | 00:00:06 |
index_update | 00:05:18.215707 | 00:06:09.404842 |
00:05:49.015441 | 00:05:39.987697 | 00:05:38.723376 |
index_update_recovery | 00:01:48 | 00:01:51 | 00:01:48
| 00:01:47 | 00:01:47 |
search_new | 00:25:21.481699 | 00:23:23.59775 |
00:25:13.943362 | 00:23:58.633514 | 00:22:30.763075 |
search_updated | 00:25:57.622592 | 00:25:29.867388 |
00:27:33.683614 | 00:25:17.565714 | 00:26:29.333003 |
label | size | 16-entries | 32-entries | 64-entries |
128-entries |
---------------+------------+------------+------------+------------+-------------+
new | 884514816 | 417013760 | 421240832 | 430350336 |
450994176 |
after_updates | 1595252736 | 711368704 | 719380480 | 735682560 |
774275072 |
It's probably an option to select 64 entries instead of 32.
There is still some regression in update speed. However, there is also room
for improvement patch. It searches item index entries 2 times on insert: in
dataLocateLeafItem and dataPlaceToPage. We can save full results of
dataLocateLeafItem, but it require some rework of gin btree interface:
store not only offset of item.
------
With best regards,
Alexander Korotkov.