Fillfactor for GIN indexes

Started by Michael Paquierover 11 years ago38 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.

Note that to have this feature correctly working, the fix I sent yesterday
to set up isBuild for the entry insertion is needed (patch attached as well
here to facilitate the review):
/messages/by-id/CAB7nPqSc4VQ9mHKqm_YvAfcTEhO-iUY8SKbXYdnMGnAi1XnPaw@mail.gmail.com

Here are the results of some tests with a simple pg_trgm index on the
English translation of "Les Miserables":
CREATE EXTENSION pg_trgm;
CREATE TABLE les_miserables (num serial, line text);
COPY les_miserables (line) FROM '/to/path/pg135.txt';
CREATE INDEX les_miserables_100 ON les_miserables USING gin (line
gin_trgm_ops);
CREATE INDEX les_miserables_40 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 40);
CREATE INDEX les_miserables_20 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 20);
CREATE INDEX les_miserables_80 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 80);
CREATE INDEX les_miserables_10 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 10);
SELECT relname, pg_size_pretty(pg_relation_size(oid)), reloptions FROM
pg_class where relname like 'les_miserables_%';
relname | pg_size_pretty | reloptions
------------------------+----------------+-----------------
les_miserables_100 | 8256 kB | null
les_miserables_20 | 14 MB | {fillfactor=20}
les_miserables_40 | 11 MB | {fillfactor=40}
les_miserables_80 | 8712 kB | {fillfactor=80}
les_miserables_num_seq | 8192 bytes | null
(5 rows)

I am adding that to the commit fest of December.
Regards,
--
Michael

Attachments:

0002-Support-fillfactor-for-GIN-indexes.patchapplication/x-patch; name=0002-Support-fillfactor-for-GIN-indexes.patchDownload+53-9
0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchapplication/x-patch; name=0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchDownload+1-1
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: Fillfactor for GIN indexes

On Fri, Nov 21, 2014 at 2:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I am adding that to the commit fest of December.

Here are updated patches. Alvaro notified me about an inconsistent comment.
--
Michael

Attachments:

0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchDownload+1-1
0002-Support-fillfactor-for-GIN-indexes.patchtext/x-patch; charset=US-ASCII; name=0002-Support-fillfactor-for-GIN-indexes.patchDownload+53-9
#3Cédric Villemain
cedric@2ndquadrant.com
In reply to: Michael Paquier (#2)
Re: Fillfactor for GIN indexes

Le jeudi 27 novembre 2014 23:33:11 Michael Paquier a écrit :

On Fri, Nov 21, 2014 at 2:12 PM, Michael Paquier

<michael.paquier@gmail.com> wrote:

I am adding that to the commit fest of December.

Here are updated patches. Alvaro notified me about an inconsistent
comment.

what are the benefits of this patch ? (maybe you had some test case or a
benchmark ?)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#1)
Re: Fillfactor for GIN indexes

Hi!

On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.

That's not true. Let us discuss it a little bit.
In btree access method index is created by sorting index tuples. Once they
are sorted it can write a tree with any fullness of the pages. With
completely filled pages you will get flood of index page splits on table
updates or inserts. In order to avoid that the default fillfactor is 90.
Besides index creation, btree access method tries to follow given
fillfactor in the case of inserting ordered data. When rightmost page
splits then fillfactor percent of data goes to the left page.
Btree page life cycle is between being freshly branched by split (50%
filled) and being full packed (100% filled) and then splitted. Thus we can
roughly estimate average fullness of btree page to be 75%. This
"equilibrium" state can be achieved by huge amount of inserts over btree
index.

Fillfactor was spreaded to other access methods. For instance, GiST. In
GiST, tree is always constructed by page splits. So, freshly created GiST
is already in its "equilibrium" state. Even more GiST doesn't splits page
by halves. Split ration depends on opclass. So, "equilibrium" fullness of
particular GiST index is even less than 75%. What does fillfactor do with
GiST? It makes GiST pages split on fillfactor fullness on index build. So,
GiST is even less fulled. But why? You anyway don't get flood of split on
fresh GiST index because it's in "equilibrium" state. That's why I would
rather delete fillfactor from GiST until we have some algorithm to
construct GiST without page splits.

What is going on with GIN? GIN is, basically, two-level nested btree. So,
fillfactor should be applicable here. But GIN is not constructed like
btree...
GIN accumulates maintenance_work_mem of data and then inserts it into the
tree. So fresh GIN is the result of insertion of maintenance_work_mem
buckets. And each bucket is sorted. On small index (or large
maintenance_work_mem) it would be the only one bucket. GIN has two levels
of btree: entry tree and posting tree. Currently entry tree has on special
logic to control fullness during index build. So, with only one bucket
entry tree will be 50% filled. With many buckets entry tree will be at
"equilibrium" state. In some specific cases (about two buckets) entry tree
can be almost fully packed. Insertion into posting tree is always ordered
during index build because posting tree is a tree of TIDs. When posting
tree page splits it leaves left page fully packed during index build and
75% packed during insertion (because it expects some sequential inserts).

Idea of GIN building algorithm is that entry tree is expected to be
relatively small and fit to cache. Insertions into posting trees are
orders. Thus this algorithm should be IO efficient and have low overhead.
However, with large entry trees this algorithm can perform much worse.

My summary is following:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.
3) You definitely need to touch code that selects ratio of split in
dataPlaceToPageLeaf (starting with if (!btree->isBuild)).
3) GIN data pages are always compressed excepts pg_upgraded indexes from
pre-9.4. Take care about it in following code.
  if (GinPageIsCompressed(page))
  freespace = GinDataLeafPageGetFreeSpace(page);
+ else if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;

------
With best regards,
Alexander Korotkov.

#5Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#4)
Re: Fillfactor for GIN indexes

On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.

That's not true. Let us discuss it a little bit.
In btree access method index is created by sorting index tuples. Once they
are sorted it can write a tree with any fullness of the pages. With
completely filled pages you will get flood of index page splits on table
updates or inserts. In order to avoid that the default fillfactor is 90.
Besides index creation, btree access method tries to follow given fillfactor
in the case of inserting ordered data. When rightmost page splits then
fillfactor percent of data goes to the left page.
Btree page life cycle is between being freshly branched by split (50%
filled) and being full packed (100% filled) and then splitted. Thus we can
roughly estimate average fullness of btree page to be 75%. This
"equilibrium" state can be achieved by huge amount of inserts over btree
index.

Fillfactor was spreaded to other access methods. For instance, GiST. In
GiST, tree is always constructed by page splits. So, freshly created GiST is
already in its "equilibrium" state. Even more GiST doesn't splits page by
halves. Split ration depends on opclass. So, "equilibrium" fullness of
particular GiST index is even less than 75%. What does fillfactor do with
GiST? It makes GiST pages split on fillfactor fullness on index build. So,
GiST is even less fulled. But why? You anyway don't get flood of split on
fresh GiST index because it's in "equilibrium" state. That's why I would
rather delete fillfactor from GiST until we have some algorithm to construct
GiST without page splits.

What is going on with GIN? GIN is, basically, two-level nested btree. So,
fillfactor should be applicable here. But GIN is not constructed like
btree...
GIN accumulates maintenance_work_mem of data and then inserts it into the
tree. So fresh GIN is the result of insertion of maintenance_work_mem
buckets. And each bucket is sorted. On small index (or large
maintenance_work_mem) it would be the only one bucket. GIN has two levels of
btree: entry tree and posting tree. Currently entry tree has on special
logic to control fullness during index build. So, with only one bucket entry
tree will be 50% filled. With many buckets entry tree will be at
"equilibrium" state. In some specific cases (about two buckets) entry tree
can be almost fully packed. Insertion into posting tree is always ordered
during index build because posting tree is a tree of TIDs. When posting tree
page splits it leaves left page fully packed during index build and 75%
packed during insertion (because it expects some sequential inserts).

Idea of GIN building algorithm is that entry tree is expected to be
relatively small and fit to cache. Insertions into posting trees are orders.
Thus this algorithm should be IO efficient and have low overhead. However,
with large entry trees this algorithm can perform much worse.

My summary is following:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.
3) You definitely need to touch code that selects ratio of split in
dataPlaceToPageLeaf (starting with if (!btree->isBuild)).
3) GIN data pages are always compressed excepts pg_upgraded indexes from
pre-9.4. Take care about it in following code.
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
+ else if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;

This is a very interesting explanation; thanks for writing it up!

It does leave me wondering why anyone would want fillfactor for GIN at
all, even if they were willing to rewrite the build algorithm.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#5)
Re: Fillfactor for GIN indexes

On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.

That's not true. Let us discuss it a little bit.
[blah discussion]

That's quite a nice explanation. Thanks!

My summary is following:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.

TBH, I am not really planning to rewrite the whole code.

3) You definitely need to touch code that selects ratio of split in
dataPlaceToPageLeaf (starting with if (!btree->isBuild)).

OK I see, so for a split we need to have a calculation based on the
fillfactor, with 75% by default.

4) GIN data pages are always compressed excepts pg_upgraded indexes from
pre-9.4. Take care about it in following code.
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
+ else if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;

Hm. Simply reversing both conditions is fine, no?

This is a very interesting explanation; thanks for writing it up!
It does leave me wondering why anyone would want fillfactor for GIN at
all, even if they were willing to rewrite the build algorithm.

Based on the explanation of Alexander, the current GIN code fills in a
page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall
correctly. IMO a higher fillfactor makes sense for a GIN index that
gets less random updates, no?

I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?
--
Michael

Attachments:

20150107_gin_fillfactor_v2.patchapplication/x-patch; name=20150107_gin_fillfactor_v2.patchDownload+59-15
#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#6)
Re: Fillfactor for GIN indexes

On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <

michael.paquier@gmail.com>

wrote:

Please find attached a simple patch adding fillfactor as storage

parameter

for GIN indexes. The default value is the same as the one currently

aka 100

to have the pages completely packed when a GIN index is created.

That's not true. Let us discuss it a little bit.
[blah discussion]

That's quite a nice explanation. Thanks!

My summary is following:
1) In order to have fully correct support of fillfactor in GIN we need

to

rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with

entry

tree. However, you can implement some heuristics.

TBH, I am not really planning to rewrite the whole code.

3) You definitely need to touch code that selects ratio of split in
dataPlaceToPageLeaf (starting with if (!btree->isBuild)).

OK I see, so for a split we need to have a calculation based on the
fillfactor, with 75% by default.

4) GIN data pages are always compressed excepts pg_upgraded indexes from
pre-9.4. Take care about it in following code.
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
+ else if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;

Hm. Simply reversing both conditions is fine, no?

This is a very interesting explanation; thanks for writing it up!
It does leave me wondering why anyone would want fillfactor for GIN at
all, even if they were willing to rewrite the build algorithm.

Based on the explanation of Alexander, the current GIN code fills in a
page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall
correctly. IMO a higher fillfactor makes sense for a GIN index that
gets less random updates, no?

I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?

Rewritten version of patch is attached. I made following changes:

1) I removed fillfactor handling from entry tree. Because in this case
fillfactor parameter would be only upper bound for actual page fullness.
It's very like GiST approach to fillfactor. But I would rather remove
fillfactor from GiST than spread such approach to other AMs.
2) Fillfactor handling for posting trees is fixed.
3) Default fillfactor for GIN is reverted to 90. I didn't mean that default
fillfactor should be 75%. I mean that equilibrium state of tree would be
about 75% fullness. But that doesn't mean that we don't want indexes to be
better packed just after creation. Anyway I didn't see reason why default
fillfactor for GIN btree should differs from default fillfactor of regular
btree.

------
With best regards,
Alexander Korotkov.

Attachments:

gin_fillfactor_3.patchapplication/octet-stream; name=gin_fillfactor_3.patchDownload+71-20
#8Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#7)
Re: Fillfactor for GIN indexes

On Thu, Jan 8, 2015 at 12:31 AM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:

Rewritten version of patch is attached. I made following changes:

1) I removed fillfactor handling from entry tree. Because in this case
fillfactor parameter would be only upper bound for actual page fullness.
It's very like GiST approach to fillfactor. But I would rather remove
fillfactor from GiST than spread such approach to other AMs.
2) Fillfactor handling for posting trees is fixed.
3) Default fillfactor for GIN is reverted to 90. I didn't mean that
default fillfactor should be 75%. I mean that equilibrium state of tree
would be about 75% fullness. But that doesn't mean that we don't want
indexes to be better packed just after creation. Anyway I didn't see reason
why default fillfactor for GIN btree should differs from default fillfactor
of regular btree.

Just forgot simple case that I used to test the patch.

create table test as (select array[1,2,3] v from
generate_series(1,1000000));
create index test_idx20 on test using gin(v) with (fillfactor=20);
create index test_idx30 on test using gin(v) with (fillfactor=30);
create index test_idx40 on test using gin(v) with (fillfactor=40);
create index test_idx50 on test using gin(v) with (fillfactor=50);
create index test_idx60 on test using gin(v) with (fillfactor=60);
create index test_idx70 on test using gin(v) with (fillfactor=70);
create index test_idx80 on test using gin(v) with (fillfactor=80);
create index test_idx90 on test using gin(v) with (fillfactor=90);
create index test_idx100 on test using gin(v) with (fillfactor=100);

List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------+-------+--------+-------+---------+-------------
public | test_idx20 | index | smagen | test | 16 MB |
public | test_idx30 | index | smagen | test | 11 MB |
public | test_idx40 | index | smagen | test | 8152 kB |
public | test_idx50 | index | smagen | test | 6520 kB |
public | test_idx60 | index | smagen | test | 5176 kB |
public | test_idx70 | index | smagen | test | 4480 kB |
public | test_idx80 | index | smagen | test | 3928 kB |
public | test_idx90 | index | smagen | test | 3520 kB |
public | test_idx100 | index | smagen | test | 3184 kB |
(9 rows)

------
With best regards,
Alexander Korotkov.

#9Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#7)
Re: Fillfactor for GIN indexes

On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <michael.paquier@gmail.com>

I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?

Rewritten version of patch is attached. I made following changes:

Thanks! With this patch (and my previous version as well) GIN indexes
with default fillfactor have a size higher than 9.4 indexes, 9.4
behavior being consistent only with fillfactor=100 and not the default
of 90. Are we fine with that?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#9)
Re: Fillfactor for GIN indexes

On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <michael.paquier@gmail.com>

I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?

Rewritten version of patch is attached. I made following changes:

Thanks! With this patch (and my previous version as well) GIN indexes
with default fillfactor have a size higher than 9.4 indexes, 9.4
behavior being consistent only with fillfactor=100 and not the default
of 90. Are we fine with that?

IMO, this patch has value to control random updates on GIN indexes,
but we should have a default fillfactor of 100 to have index size
consistent with 9.4. Thoughts?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#10)
Re: Fillfactor for GIN indexes

On Thu, Jan 15, 2015 at 10:19 AM, Michael Paquier <michael.paquier@gmail.com

wrote:

On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov <aekorotkov@gmail.com>

wrote:

On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <

michael.paquier@gmail.com>

I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?

Rewritten version of patch is attached. I made following changes:

Thanks! With this patch (and my previous version as well) GIN indexes
with default fillfactor have a size higher than 9.4 indexes, 9.4
behavior being consistent only with fillfactor=100 and not the default
of 90. Are we fine with that?

IMO, this patch has value to control random updates on GIN indexes,
but we should have a default fillfactor of 100 to have index size
consistent with 9.4. Thoughts?

I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a
final decision.

------
With best regards,
Alexander Korotkov.

#12Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#11)
Re: Fillfactor for GIN indexes

Alexander Korotkov wrote:

I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a final
decision.

OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#12)
Re: Fillfactor for GIN indexes

On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Alexander Korotkov wrote:

I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a final
decision.

OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.

I'm struggling to understand why we shouldn't just reject this patch.
On November 27th, Cedric said:

"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"

Nobody replied. On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.

There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#13)
Re: Fillfactor for GIN indexes

On Sat, Jan 17, 2015 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:

There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?

Indeed. There is no such benchmark as of now, and I am not sure I'll
get the time to work more on that soon, so let's reject the patch for
now. And sorry for the useless noise.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#13)
Re: Fillfactor for GIN indexes

On Fri, Jan 16, 2015 at 8:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Alexander Korotkov wrote:

I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a

final

decision.

OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.

I'm struggling to understand why we shouldn't just reject this patch.
On November 27th, Cedric said:

"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"

Nobody replied. On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.

There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?

I already wrote quite detailed explanation of subject. Let mel try to
explain in shortly. GIN is two level nested btree. Thus, GIN would have
absolutely same benefits from fillfactor as btree. Lack of tests showing it
is, for sure, fault.

However, GIN posting trees are ordered by ItemPointer and this makes some
specific. If you have freshly created table and do inserts/updates they
would use the end of heap. Thus, inserts would go to the end of GIN posting
tree and fillfactor wouldn't affect anything. Fillfactor would give
benefits on HOT or heap space re-usage.

In the following example you can see that index size was increased greatly
while updating every 20th row. It's because every update causes page split
in index.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100,
fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3184 kB* │

# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *5264 kB* │
(1 row)

But if we create index with fillfactor=90, index size would remain the
same: no page splits.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx90 on test using gin(v) with (fillfactor=90,
fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ *3520 kB* │

# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ *3520 kB* │
(1 row)

Similar situation would be if we use fastupdate. But fastupdate takes some
space for pending lists which is independent from fillfactor.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100,
fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3184 kB* │

# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *7256 kB* │

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=90,
fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3520 kB* │

# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *5512 kB* │

BTW, previous version of patch contained some bugs. Revised version is
attached.

------
With best regards,
Alexander Korotkov.

Attachments:

gin_fillfactor_4.patchapplication/octet-stream; name=gin_fillfactor_4.patchDownload+71-20
#16Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#15)
Re: Fillfactor for GIN indexes

On Sat, Jan 17, 2015 at 12:49 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:

BTW, previous version of patch contained some bugs. Revised version is
attached.

Ooops, it's here.

------
With best regards,
Alexander Korotkov.

Attachments:

gin_fillfactor_5.patchapplication/octet-stream; name=gin_fillfactor_5.patchDownload+76-20
#17Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#15)
Re: Fillfactor for GIN indexes

On Sat, Jan 17, 2015 at 4:49 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

I already wrote quite detailed explanation of subject. Let mel try to
explain in shortly. GIN is two level nested btree. Thus, GIN would have
absolutely same benefits from fillfactor as btree. Lack of tests showing it
is, for sure, fault.

However, GIN posting trees are ordered by ItemPointer and this makes some
specific. If you have freshly created table and do inserts/updates they
would use the end of heap. Thus, inserts would go to the end of GIN posting
tree and fillfactor wouldn't affect anything. Fillfactor would give benefits
on HOT or heap space re-usage.

Ah, OK. Those tests clarify things considerably; I see the point now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Cédric Villemain
cedric@2ndquadrant.com
In reply to: Robert Haas (#17)
Re: Fillfactor for GIN indexes

Le lundi 19 janvier 2015 08:24:08 Robert Haas a écrit :

On Sat, Jan 17, 2015 at 4:49 AM, Alexander Korotkov

<aekorotkov@gmail.com> wrote:

I already wrote quite detailed explanation of subject. Let mel try
to
explain in shortly. GIN is two level nested btree. Thus, GIN would
have absolutely same benefits from fillfactor as btree. Lack of
tests showing it is, for sure, fault.

However, GIN posting trees are ordered by ItemPointer and this makes
some specific. If you have freshly created table and do
inserts/updates they would use the end of heap. Thus, inserts would
go to the end of GIN posting tree and fillfactor wouldn't affect
anything. Fillfactor would give benefits on HOT or heap space
re-usage.

Ah, OK. Those tests clarify things considerably; I see the point now.

So I do.

Alexander said:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.

The patch is 2), for the posting tree only ?

Thanks!
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#19Alexander Korotkov
aekorotkov@gmail.com
In reply to: Cédric Villemain (#18)
Re: Fillfactor for GIN indexes

On Mon, Jan 19, 2015 at 5:46 PM, Cédric Villemain <cedric@2ndquadrant.com>
wrote:

Le lundi 19 janvier 2015 08:24:08 Robert Haas a écrit :

On Sat, Jan 17, 2015 at 4:49 AM, Alexander Korotkov

<aekorotkov@gmail.com> wrote:

I already wrote quite detailed explanation of subject. Let mel try

to

explain in shortly. GIN is two level nested btree. Thus, GIN would

have absolutely same benefits from fillfactor as btree. Lack of

tests showing it is, for sure, fault.

However, GIN posting trees are ordered by ItemPointer and this makes

some specific. If you have freshly created table and do

inserts/updates they would use the end of heap. Thus, inserts would

go to the end of GIN posting tree and fillfactor wouldn't affect

anything. Fillfactor would give benefits on HOT or heap space

re-usage.

Ah, OK. Those tests clarify things considerably; I see the point now.

So I do.

Alexander said:

1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.

2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.

The patch is 2), for the posting tree only ?

Yes, it's just for posting tree.

------
With best regards,
Alexander Korotkov.

#20Cédric Villemain
cedric@2ndquadrant.com
In reply to: Michael Paquier (#14)
Re: Fillfactor for GIN indexes

Le samedi 17 janvier 2015 08:23:44 Michael Paquier a écrit :

On Sat, Jan 17, 2015 at 2:40 AM, Robert Haas

<robertmhaas@gmail.com> wrote:

There's only value in adding a fillfactor parameter to GIN indexes
if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?

Indeed. There is no such benchmark as of now, and I am not sure I'll
get the time to work more on that soon, so let's reject the patch for
now. And sorry for the useless noise.

Michael, I first didn't understood how GiN can benefits from the
patch...thus my question.
There were no noise for me, and I learn some more about GiN. So I thank
you for your work!

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#21Michael Paquier
michael@paquier.xyz
In reply to: Cédric Villemain (#20)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alexander Korotkov (#19)
#23Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tomas Vondra (#22)
#24Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alexander Korotkov (#23)
#25Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#16)
#26Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#26)
#28Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#27)
#29Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Alexander Korotkov (#28)
#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Michael Paquier (#30)
#32Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#31)
#33Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#32)
#34Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#33)
#35Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#34)
#36Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#35)
#37Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#36)
#38Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#37)