GIN improvements part 1: additional information

Started by Alexander Korotkovalmost 13 years ago123 messageshackers
Jump to latest
#1Alexander Korotkov
aekorotkov@gmail.com

Hackers,

Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.

Resemble GIN interface changes that this patch introduce.
Patch modifies GIN interface as following:
1) Two arguments are added to extractValue
Datum **addInfo, bool **addInfoIsNull
2) Two arguments are added to consistent
Datum addInfo[], bool addInfoIsNull[]
3) New method config is introduced which returns datatype oid of addtional
information (analogy with SP-GiST config method).

Additionally there is another patch which demonstrates benefits from
additional information storage itself (because it don't accelerate tsearch
itselt). It comes in separate thread.

------
With best regards,
Alexander Korotkov.

Attachments:

ginaddinfo.4.patch.gzapplication/x-gzip; name=ginaddinfo.4.patch.gzDownload
#2Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#1)
Re: GIN improvements part 1: additional information

On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:

Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.

New version of patch is attached with some more refactoring and bug fixes.

------
With best regards,
Alexander Korotkov.

Attachments:

ginaddinfo.5.patch.gzapplication/x-gzip; name=ginaddinfo.5.patch.gzDownload
#3Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#2)
Re: GIN improvements part 1: additional information

On Mon, Jun 17, 2013 at 4:54 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov <aekorotkov@gmail.com

wrote:

Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.

New version of patch is attached with some more refactoring and bug fixes.

Another revision with more bug fixes.

------
With best regards,
Alexander Korotkov.

Attachments:

ginaddinfo.6.patch.gzapplication/x-gzip; name=ginaddinfo.6.patch.gzDownload
#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#3)
Re: GIN improvements part 1: additional information

On Wed, Jun 19, 2013 at 12:44 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:

On Mon, Jun 17, 2013 at 4:54 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

On Fri, Jun 14, 2013 at 12:09 AM, Alexander Korotkov <
aekorotkov@gmail.com> wrote:

Revised version of patch for additional information storage in GIN is
attached. Changes are mostly bug fixes.

New version of patch is attached with some more refactoring and bug fixes.

Another revision with more bug fixes.

New revision of patch is attached. Now it includes some docs.

------
With best regards,
Alexander Korotkov.

Attachments:

ginaddinfo.7.patch.gzapplication/x-gzip; name=ginaddinfo.7.patch.gzDownload
#5Antonin Houska
ah@cybertec.at
In reply to: Alexander Korotkov (#4)
Re: GIN improvements part 1: additional information

On 06/25/2013 12:03 AM, Alexander Korotkov wrote:

New revision of patch is attached. Now it includes some docs.

Hi,
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:

* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever

* gin_private.h:ginDataPageLeafRead() - fetch_att() is used to retrieve
the additional info, but per the definition and comments in tupmacs.h it
expects aligned pointer.

* gindatapage.c:ginCheckPlaceToDataPageLeaf() - comment "if leaf data
page" should probably be "on a leaf data page" or so.

Regards,
Antonin Houska (Tony)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#4)
Re: GIN improvements part 1: additional information

On 25.06.2013 01:03, Alexander Korotkov wrote:

New revision of patch is attached. Now it includes some docs.

Thanks! I'm looking into this in detail now.

First, this patch actually contains two major things:

1. Pack item pointers more tightly on posting data leaf pages.
2. Allow opclass implementation to attach "additional information" to
each item pointer.

These are two very distinct features, so this patch needs to be split
into two. I extracted the 1st part into a separate patch, attached, and
am going to focus on that now.

I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.

The patch needs a lot of cleanup still, and I may well have broken some
stuff, but I'm quite pleased with the performance. I tested this with
two tables; one is the titles from the DBLP dataset. Another is integer
arrays, created with this:

create function randomintarr() returns int[] as $$ select
array_agg((random() * 1000.0)::int4) from generate_series(1,10) $$
language sql;
create table intarrtbl as select randomintarr() as ii from
generate_series(1, 10000000);

The effect on the index sizes is quite dramatic:

postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size |
--------+--------------------+-------+--------+-------------+--------+
public | gin_intarr_master | index | heikki | intarrtbl | 585 MB |
public | gin_intarr_patched | index | heikki | intarrtbl | 211 MB |
public | gin_title | index | heikki | dblp_titles | 93 MB |
public | gin_title_master | index | heikki | dblp_titles | 180 MB |
(4 rows)

Tomas Vondra tested the search performance of an earlier version of this
patch: /messages/by-id/50BFF89A.7080908@fuzzy.cz).
He initially saw a huge slowdown, but could not reproduce it with a
later version of the patch. I did not see much difference in a few quick
queries I ran, so we're probably good on that front.

There's a few open questions:

1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with
the patch, so anyone using GIN probably wants to rebuild them anyway,
sooner or later. Still, I'd like to give it a shot.

2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?

3. I'd like to see some performance testing of insertions, deletions,
and vacuum. I suspect that maintaining the 32-entry index might be
fairly expensive, as it's rewritten on every update to a leaf page.

- Heikki

Attachments:

gin-packed-postinglists-1.patchtext/x-diff; name=gin-packed-postinglists-1.patchDownload+1504-874
#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Antonin Houska (#5)
Re: GIN improvements part 1: additional information

On 27.06.2013 17:20, Antonin Houska wrote:

I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:

* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever

Hmm. It won't loop forever, i is counting the number of entries that fit
on the page, while j is used to loop through the items being added.
However, i isn't actually used for anything (and isn't initialized
either), so it's just dead code.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Antonin Houska
ah@cybertec.at
In reply to: Heikki Linnakangas (#7)
Re: GIN improvements part 1: additional information

On 06/29/2013 11:00 AM, Heikki Linnakangas wrote:

On 27.06.2013 17:20, Antonin Houska wrote:

I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:

* gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever

Hmm. It won't loop forever, i is counting the number of entries that
fit on the page, while j is used to loop through the items being
added. However, i isn't actually used for anything (and isn't
initialized either), so it's just dead code.

- Heikki

You're right. While thinking about possible meaning of the 'i' I didn't
notice that j++ is in the 'for' construct. Stupid mistake on my side.

Tony

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#6)
Re: GIN improvements part 1: additional information

On 29.06.2013 11:56, Heikki Linnakangas wrote:

On 25.06.2013 01:03, Alexander Korotkov wrote:

New revision of patch is attached. Now it includes some docs.

Thanks! I'm looking into this in detail now.

First, this patch actually contains two major things:

1. Pack item pointers more tightly on posting data leaf pages.
2. Allow opclass implementation to attach "additional information" to
each item pointer.

These are two very distinct features, so this patch needs to be split
into two. I extracted the 1st part into a separate patch, attached, and
am going to focus on that now.

I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.

Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#9)
Re: GIN improvements part 1: additional information

On 29.06.2013 20:08, Heikki Linnakangas wrote:

On 29.06.2013 11:56, Heikki Linnakangas wrote:

I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.

Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..

Here's the correct version. I've probably broken things, sorry about that.

I'm going to mark this as "returned with feedback" now. The code still
needs a lot of general cleanup, including comment and README updates.
Also, please take some time to consider the open questions I listed
here: archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com.

- Heikki

Attachments:

gin-packed-postinglists-2.patchtext/x-diff; name=gin-packed-postinglists-2.patchDownload+1393-429
#11Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#10)
Re: GIN improvements part 1: additional information

On Sun, Jun 30, 2013 at 2:50 PM, Heikki Linnakangas <hlinnakangas@vmware.com

wrote:

On 29.06.2013 20:08, Heikki Linnakangas wrote:

On 29.06.2013 11:56, Heikki Linnakangas wrote:

I made one significant change: I removed the 'freespace' field you added
to GinpageOpaque. Instead, on data leaf pages the offset from the
beginning of the page where the packed items end is stored in place of
the 'maxoff' field. This allows for quick calculation of the free space,
but there is no count of item pointers stored on the page anymore, so
some code that looped through all the item pointers relying on 'maxoff'
had to be changed to work with the end offset instead. I'm not 100%
wedded on this, but I'd like to avoid adding the redundant freespace
field on pages that don't need it, because it's confusing and you have
to remember to keep them in sync.

Ah, crap, looks like I sent the wrong patch, and now I can't find the
correct one anymore. The patch I sent didn't include the changes store
end offset instead of freespace. I'll rewrite that part..

Here's the correct version. I've probably broken things, sorry about that.

I'm going to mark this as "returned with feedback" now. The code still
needs a lot of general cleanup, including comment and README updates. Also,
please take some time to consider the open questions I listed here:
archives.postgresql.org/**message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com&gt;
.

Thanks! So, we have a lot of stuff and you give the points for further
work. Could you please verify my plan of work on these patches:
1) Solving questions of archives.postgresql.org/**
message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com&gt;
for
packed postinglists.
2) Extract additional info patch based on packed postinglists.
3) Rewrite interface of fast scan. Do CPU and IO benchmarking.
4) Do IO benchmarking of index ordering.
Cleanup, comments and READMEs are assumed in each item.

------
With best regards,
Alexander Korotkov.

#12Alexander Korotkov
aekorotkov@gmail.com
In reply to: Antonin Houska (#5)
Re: GIN improvements part 1: additional information

On Thu, Jun 27, 2013 at 6:20 PM, Antonin Houska <antonin.houska@gmail.com>wrote:

On 06/25/2013 12:03 AM, Alexander Korotkov wrote:

New revision of patch is attached. Now it includes some docs.

Hi,
I was curious about the new layout of the data page, so I spent a while
looking into the code.
It's interesting, but I suspect 2 things are not o.k.:

* gindatapage.c:**dataIsEnoughSpace() - 'i++' in the for loop should
probably be 'j++', otherwise it loops forever

* gin_private.h:**ginDataPageLeafRead() - fetch_att() is used to retrieve
the additional info, but per the definition and comments in tupmacs.h it
expects aligned pointer.

* gindatapage.c:**ginCheckPlaceToDataPageLeaf() - comment "if leaf data
page" should probably be "on a leaf data page" or so.

Hi!

Thanks for pointing these.

------
With best regards,
Alexander Korotkov.

#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#11)
Re: GIN improvements part 1: additional information

On 01.07.2013 13:28, Alexander Korotkov wrote:

Thanks! So, we have a lot of stuff and you give the points for further
work. Could you please verify my plan of work on these patches:
1) Solving questions of archives.postgresql.org/**
message-id/51CEA13C.8040103@**vmware.com<http://archives.postgresql.org/message-id/51CEA13C.8040103@vmware.com&gt;
for
packed postinglists.
2) Extract additional info patch based on packed postinglists.
3) Rewrite interface of fast scan. Do CPU and IO benchmarking.
4) Do IO benchmarking of index ordering.
Cleanup, comments and READMEs are assumed in each item.

Yep, sounds good!

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alexander Korotkov (#4)
Re: GIN improvements part 1: additional information

Hi,

I've done a fair amount of testing by loading pgsql-general archives
into a database and running a bunch of simple ts queries that use a GIN
index.

I've tested this as well as the two other patches, but as I was able to
get meaningful results only from this patch, I'll post the results here
and info about segfaults and other observed errors to the other threads.

First of all - update the commitfest page whenever you submit a new
patch version, please. I've spent two or three hours testing and
debugging a patches linked from those pages only to find out that there
are newer versions. I should have checked that initially, but let's keep
that updated.

I wan't able to apply the patches to the current head, so I've used
b8fd1a09 (from 17/06) as a base commit.

The following table shows these metrics:

* data load
- how long it took to import ~200k messages from the list archive
- includes a lot of time spent in Python (parsing), checking FKs ...
- so unless this is significantly higher, it's probably OK

* index size
- size of the main GIN index on message body

* 1/2/3-word(s)
- number of queries in the form

SELECT id FROM messages
WHERE body_tsvector @@ plainto_tsquery('english', 'w1 w2')
LIMIT 100

(executed over 60 seconds, and 'per second' speed)

All the scripts are available at https://bitbucket.org/tvondra/archie

Now, the results:

no patches:
data load: 710 s
index size: 545 MB
1 word: 37500 (630/s)
2 words: 49800 (800/s)
3 words: 40000 (660/s)

additional info (ginaddinfo.7.patch):
data load: 693 s
index size: 448 MB
1 word: 135000 (2250/s)
2 words: 85000 (1430/s)
3 words: 54000 ( 900/s)

additional info + fast scan (gin_fast_scan.4.patch):
data load: 720 s
index size: 455 MB
1 word: FAIL
2 words: FAIL
3 words: FAIL

additional info + fast scan + ordering (gin_ordering.4.patch):
data load: FAIL
index size: N/A
1 word: N/A
2 words: N/A
3 words: N/A

So the speedup after adding info into GIN seems very promising, although
I don't quite understand why searching for two words is so much slower.
Also the index size seems to decrease significantly.

After applying 'fast scan' the things started to break down, so I wasn't
able to run the queries and then even the load failed consistently.

I'll post the info into the appropriate threads.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#6)
Re: GIN improvements part 1: additional information

On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

There's a few open questions:

1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
later. Still, I'd like to give it a shot.

2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?

3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.

It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version of
patch by introducing incremental updating of that index. Benchmarks will be
posted later today.

------
With best regards,
Alexander Korotkov.

Attachments:

gin-packed-postinglists-3.patch.gzapplication/x-gzip; name=gin-packed-postinglists-3.patch.gzDownload
#16Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Korotkov (#15)
Re: GIN improvements part 1: additional information

Please fix compiler warnings:

gindatapage.c: In function ‘dataPlaceToPage’:
gindatapage.c:706:24: warning: unused variable ‘pageBackup’ [-Wunused-variable]
gindatapage.c: In function ‘updateItemIndexes’:
gindatapage.c:1182:6: warning: variable ‘totalsize’ set but not used [-Wunused-but-set-variable]
gindatapage.c: In function ‘dataPlaceToPage’:
gindatapage.c:633:28: warning: ‘startI’ may be used uninitialized in this function [-Wuninitialized]
gindatapage.c:617:21: note: ‘startI’ was declared here

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#15)
Re: GIN improvements part 1: additional information

On 15.09.2013 12:14, Alexander Korotkov wrote:

On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:

There's a few open questions:

1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
later. Still, I'd like to give it a shot.

2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?

3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.

It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version of
patch by introducing incremental updating of that index. Benchmarks will be
posted later today..

Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.

A couple of quick comments after a quick glance (in addition to the above):

The varbyte encoding stuff is a quite isolated piece of functionality.
And potentially useful elsewhere, too. It would be good to separate that
into a separate .c/.h files. I'm thinking of
src/backend/access/gin/packeditemptr.c, which would contain
ginDataPageLeafReadItemPointer, ginDataPageLeafWriteItemPointer and
ginDataPageLeafGetItemPointerSize functions. A new typedef for
varbyte-encoded things would probably be good too, something like
"typedef char *PackedItemPointer". In the new .c file, please also add a
file-header comment explaining the encoding.

The README really needs to be updated. The posting tree page structure
is a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.

It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending
on what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#17)
Re: GIN improvements part 1: additional information

On Mon, Sep 16, 2013 at 11:43 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

On 15.09.2013 12:14, Alexander Korotkov wrote:

On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:

There's a few open questions:

1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner
or
later. Still, I'd like to give it a shot.

2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?

3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.

It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version
of
patch by introducing incremental updating of that index. Benchmarks will
be
posted later today..

Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.

A couple of quick comments after a quick glance (in addition to the above):

The varbyte encoding stuff is a quite isolated piece of functionality. And
potentially useful elsewhere, too. It would be good to separate that into a
separate .c/.h files. I'm thinking of src/backend/access/gin/**packeditemptr.c,
which would contain ginDataPageLeafReadItemPointer**,
ginDataPageLeafWriteItemPointe**r and ginDataPageLeafGetItemPointerS**ize
functions. A new typedef for varbyte-encoded things would probably be good
too, something like "typedef char *PackedItemPointer". In the new .c file,
please also add a file-header comment explaining the encoding.

The README really needs to be updated. The posting tree page structure is
a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.

It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending on
what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)

Apparently some bugs breaks index on huge updates independent on
incremental update of the 32-entry. I'm in debugging now. That's why
benchmarks are delayed.

------
With best regards,
Alexander Korotkov.

#19Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#17)
Re: GIN improvements part 1: additional information

On Mon, Sep 16, 2013 at 11:43 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

On 15.09.2013 12:14, Alexander Korotkov wrote:

On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:

There's a few open questions:

1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner
or
later. Still, I'd like to give it a shot.

2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?

3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.

It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version
of
patch by introducing incremental updating of that index. Benchmarks will
be
posted later today..

Great! Please also benchmark WAL replay; you're still doing
non-incremental update of the 32-entry index in ginRedoInsert.

Yes

A couple of quick comments after a quick glance (in addition to the above):

The varbyte encoding stuff is a quite isolated piece of functionality. And
potentially useful elsewhere, too. It would be good to separate that into a
separate .c/.h files. I'm thinking of src/backend/access/gin/**packeditemptr.c,
which would contain ginDataPageLeafReadItemPointer**,
ginDataPageLeafWriteItemPointe**r and ginDataPageLeafGetItemPointerS**ize
functions. A new typedef for varbyte-encoded things would probably be good
too, something like "typedef char *PackedItemPointer". In the new .c file,
please also add a file-header comment explaining the encoding.

The README really needs to be updated. The posting tree page structure is
a lot more complicated now, there needs to be a whole new section to
explain it. A picture would be nice, similar to the one in bufpage.h.

It's a bit funny that we've clumped together all different kinds of GIN
insertions into one WAL record type. ginRedoInsert does completely
different things depending on what kind of a page it is. And the
ginXlogInsert struct also contains different kinds of things depending on
what kind of an insertion it is. It would be cleaner to split it into
three. (this isn't your patch's fault - it was like that before, too.)

Finally, I've debugged index update.

There are benchmark scripts attached which I used for testing. bench.sh is
doing following:
1) Switches ~/postgres to given branch, configures and compiles it
2) Initializes cluster, runs postgres, imports mailing lists archives which
could be downloaded from here:
http://mira.sai.msu.ru/~megera/tmp/pg-mailing/archives-9.1-main.dump.gz
3) Runs index creation measuring taken time and index size.
4) Runs set of index search queries measuring overall taken time and number
of used blocks. Queries was extracted from mailing lists web server logs.
So, they are real-life.
5) Runs huge updates and vacuums measuring overall taken time and final
index size.
6) Rerun set of queries.

The results of master branch:

Time taken by index build, update and search:
event | period
----------------+-----------------
index_build | 00:01:52.154299
index_update | 00:10:42.982413
search_new | 00:26:14.294872
search_updated | 00:27:06.10298
(4 rows)

Numbers of blocks used in search (not very representative, because it's
mostly consumed by heap fetches):
label | blocks_mark
----------------+-------------
search_new | 848156708
search_updated | 885122373
(2 rows)

Size of index newly created and after updates:
label | size
---------------+------------
new | 884514816
after_updates | 1595252736
(2 rows)

The results of packed posting lists branch.

Time taken by index build, update and search:
event | period
----------------+-----------------
index_build | 00:01:54.363988
index_update | 00:08:55.291099
search_new | 00:26:06.262403
search_updated | 00:27:07.501142
(4 rows)

Numbers of blocks used in search:
label | blocks_mark
----------------+-------------
search_new | 847591514
search_updated | 883928608
(2 rows)

Size of index newly created and after updates:
label | size
---------------+-----------
new | 421330944
after_updates | 718921728
(2 rows)

We can see there is no significant slowdown. Updates even becomes faster
probably because of reduced index size.

Now, I'm going to take care about WAL, refactoring and documentation.

------
With best regards,
Alexander Korotkov.

Attachments:

bench.tar.gzapplication/x-gzip; name=bench.tar.gzDownload+7-12
gin-packed-postinglists-4.patch.gzapplication/x-gzip; name=gin-packed-postinglists-4.patch.gzDownload
#20Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#19)
Re: GIN improvements part 1: additional information

Last version of patch is attached.
WAL is debugged here. How it's actually working.
Also there was some refactoring, commenting and README update.
Benchmark scripts are also updated. Updated version is attached.
I did some benchmark comparison with different count of index entries. See
results is tables below.

event | master | 16-entries | 32-entries
| 64-entries | 128-entries |
-----------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
index_build | 00:01:50.042658 | 00:01:53.79182 |
00:01:55.647561 | 00:01:52.677095 | 00:01:58.723898 |
index_build_recovery | 00:00:19 | 00:00:06 | 00:00:05
| 00:00:06 | 00:00:06 |
index_update | 00:05:18.215707 | 00:06:09.404842 |
00:05:49.015441 | 00:05:39.987697 | 00:05:38.723376 |
index_update_recovery | 00:01:48 | 00:01:51 | 00:01:48
| 00:01:47 | 00:01:47 |
search_new | 00:25:21.481699 | 00:23:23.59775 |
00:25:13.943362 | 00:23:58.633514 | 00:22:30.763075 |
search_updated | 00:25:57.622592 | 00:25:29.867388 |
00:27:33.683614 | 00:25:17.565714 | 00:26:29.333003 |

label | size | 16-entries | 32-entries | 64-entries |
128-entries |
---------------+------------+------------+------------+------------+-------------+
new | 884514816 | 417013760 | 421240832 | 430350336 |
450994176 |
after_updates | 1595252736 | 711368704 | 719380480 | 735682560 |
774275072 |

It's probably an option to select 64 entries instead of 32.
There is still some regression in update speed. However, there is also room
for improvement patch. It searches item index entries 2 times on insert: in
dataLocateLeafItem and dataPlaceToPage. We can save full results of
dataLocateLeafItem, but it require some rework of gin btree interface:
store not only offset of item.

------
With best regards,
Alexander Korotkov.

Attachments:

gin-packed-postinglists-5.patch.gzapplication/x-gzip; name=gin-packed-postinglists-5.patch.gzDownload
bench-2.tar.gzapplication/x-gzip; name=bench-2.tar.gzDownload+15-7
#21Bruce Momjian
bruce@momjian.us
In reply to: Alexander Korotkov (#15)
#22Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#20)
#23Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Korotkov (#22)
#24Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Eisentraut (#23)
#25Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#21)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#25)
#27Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#26)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#27)
#29Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#26)
#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#29)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#30)
#33Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#27)
#34Bruce Momjian
bruce@momjian.us
In reply to: Alexander Korotkov (#33)
#35Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#15)
#36Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#35)
#37Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Heikki Linnakangas (#35)
#38Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tomas Vondra (#37)
#39Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#36)
#40Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#38)
#41Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Heikki Linnakangas (#40)
#42Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tomas Vondra (#41)
#43Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#39)
#44Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Heikki Linnakangas (#42)
#45Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tomas Vondra (#44)
#46Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alexander Korotkov (#45)
#47Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#30)
#48Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#47)
#49Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#42)
#50Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#49)
#51Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#50)
#52Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#50)
#53Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#52)
#54Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#51)
#55Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#54)
#56Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#55)
#57Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Korotkov (#55)
#58Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Eisentraut (#57)
#59Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Korotkov (#58)
#60Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Eisentraut (#59)
#61Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#53)
#62Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#61)
#63Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#62)
#64Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#62)
#65Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Korotkov (#56)
#66Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#64)
#67Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#66)
#68Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#67)
#69Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#68)
#70Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#69)
#71Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#70)
#72Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#71)
#73Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#72)
#74Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#72)
#75Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#74)
#76Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#75)
#77Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#76)
#78Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#77)
#79Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#78)
#80Oleg Bartunov
oleg@sai.msu.su
In reply to: Heikki Linnakangas (#79)
#81Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Oleg Bartunov (#80)
#82Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#77)
#83Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#81)
#84Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#82)
#85Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#84)
#86Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alvaro Herrera (#85)
#87Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alexander Korotkov (#86)
#88Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#84)
#89Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Korotkov (#73)
#90Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Heikki Linnakangas (#84)
#91Alexander Korotkov
aekorotkov@gmail.com
In reply to: Amit Langote (#90)
#92Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alexander Korotkov (#91)
#93Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tomas Vondra (#92)
#94Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alexander Korotkov (#93)
#95Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Peter Eisentraut (#89)
#96Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#93)
#97Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#96)
#98Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#94)
#99Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tomas Vondra (#98)
#100Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#99)
#101Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#100)
#102Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#101)
#103Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#102)
#104Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Heikki Linnakangas (#103)
#105Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tomas Vondra (#104)
#106Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#103)
#107Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#106)
#108Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#105)
#109Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Heikki Linnakangas (#108)
#110Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#108)
#111Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#110)
#112Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tomas Vondra (#109)
#113Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#111)
#114Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#112)
#115Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#113)
#116Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#115)
#117Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#116)
#118Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#117)
#119Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#118)
#120Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#119)
#121Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#120)
#122Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Heikki Linnakangas (#112)
#123Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#122)