Fillfactor for GIN indexes
Hi all,
Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.
Note that to have this feature correctly working, the fix I sent yesterday
to set up isBuild for the entry insertion is needed (patch attached as well
here to facilitate the review):
/messages/by-id/CAB7nPqSc4VQ9mHKqm_YvAfcTEhO-iUY8SKbXYdnMGnAi1XnPaw@mail.gmail.com
Here are the results of some tests with a simple pg_trgm index on the
English translation of "Les Miserables":
CREATE EXTENSION pg_trgm;
CREATE TABLE les_miserables (num serial, line text);
COPY les_miserables (line) FROM '/to/path/pg135.txt';
CREATE INDEX les_miserables_100 ON les_miserables USING gin (line
gin_trgm_ops);
CREATE INDEX les_miserables_40 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 40);
CREATE INDEX les_miserables_20 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 20);
CREATE INDEX les_miserables_80 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 80);
CREATE INDEX les_miserables_10 ON les_miserables USING gin (line
gin_trgm_ops) with (fillfactor = 10);
SELECT relname, pg_size_pretty(pg_relation_size(oid)), reloptions FROM
pg_class where relname like 'les_miserables_%';
relname | pg_size_pretty | reloptions
------------------------+----------------+-----------------
les_miserables_100 | 8256 kB | null
les_miserables_20 | 14 MB | {fillfactor=20}
les_miserables_40 | 11 MB | {fillfactor=40}
les_miserables_80 | 8712 kB | {fillfactor=80}
les_miserables_num_seq | 8192 bytes | null
(5 rows)
I am adding that to the commit fest of December.
Regards,
--
Michael
Attachments:
0002-Support-fillfactor-for-GIN-indexes.patchapplication/x-patch; name=0002-Support-fillfactor-for-GIN-indexes.patchDownload
From eb48305ac0295fa4a46ffec5f8db447cd4c5f6b2 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Fri, 21 Nov 2014 14:08:54 +0900
Subject: [PATCH 2/2] Support fillfactor for GIN indexes
Users can call this new storage parameter to fill in the entry and
leaf pages of a newly-built index as wanted. Fillfactor range varies
between 20 and 100.
---
doc/src/sgml/ref/create_index.sgml | 4 ++--
src/backend/access/common/reloptions.c | 9 +++++++++
src/backend/access/gin/gindatapage.c | 22 ++++++++++++++++++----
src/backend/access/gin/ginentrypage.c | 20 +++++++++++++++++++-
src/backend/access/gin/ginutil.c | 3 ++-
src/include/access/gin_private.h | 3 +++
6 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 6b2ee28..c0ba24a 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -294,8 +294,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
- storage parameters. The B-tree, hash, GiST and SP-GiST index methods all
- accept this parameter:
+ storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods
+ all accept this parameter:
</para>
<variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c16b38e..7137ba9 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -15,6 +15,7 @@
#include "postgres.h"
+#include "access/gin_private.h"
#include "access/gist_private.h"
#include "access/hash.h"
#include "access/htup_details.h"
@@ -133,6 +134,14 @@ static relopt_int intRelOpts[] =
},
{
{
+ "fillfactor",
+ "Packs gin index pages only to this percentage",
+ RELOPT_KIND_GIN
+ },
+ GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+ },
+ {
+ {
"autovacuum_vacuum_threshold",
"Minimum number of tuple updates or deletes prior to vacuum",
RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 012225e..f322004 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -446,11 +446,21 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
leafSegmentInfo *lastleftinfo;
ItemPointerData maxOldItem;
ItemPointerData remaining;
+ int fillfactor;
Assert(GinPageIsData(page));
rbound = *GinDataPageGetRightBound(page);
+ /* Grab option values */
+ if (btree->index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) btree->index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
+
/*
* Count how many of the new items belong to this page.
*/
@@ -511,15 +521,19 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
/*
* If we're appending to the end of the page, we will append as many items
- * as we can fit (after splitting), and stop when the pages becomes full.
- * Otherwise we have to limit the number of new items to insert, because
- * once we start packing we can't just stop when we run out of space,
- * because we must make sure that all the old items still fit.
+ * as we can fit up to the given fillfactor at build (after splitting),
+ * and stop when the pages becomes full at this rate. Otherwise we have to
+ * limit the number of new items to insert, because once we start packing
+ * we can't just stop when we run out of space, because we must make sure
+ * that all the old items still fit.
*/
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
+ else if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;
else
freespace = 0;
+
if (append)
{
/*
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 2dae7b9..36eccd7 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -462,6 +462,16 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
Size releasedsz = 0;
Size addedsz;
Page page = BufferGetPage(buf);
+ int fillfactor;
+ int freespace;
+
+ if (btree->index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) btree->index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
Assert(insertData->entry);
Assert(!GinPageIsData(page));
@@ -475,7 +485,15 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
addedsz = MAXALIGN(IndexTupleSize(insertData->entry)) + sizeof(ItemIdData);
- if (PageGetFreeSpace(page) + releasedsz >= addedsz)
+ /*
+ * Calculate freespace available. When building the index take into
+ * account the fillfactor.
+ */
+ if (btree->isBuild)
+ freespace = PageGetFreeSpace(page) - BLCKSZ * (100 - fillfactor) / 100;
+ else
+ freespace = PageGetFreeSpace(page);
+ if (freespace + releasedsz >= addedsz)
return true;
return false;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index f593a72..bfdfd0c 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -527,7 +527,8 @@ ginoptions(PG_FUNCTION_ARGS)
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
- pendingListCleanupSize)}
+ pendingListCleanupSize)},
+ {"fillfactor", RELOPT_TYPE_INT, offsetof(GinOptions, fillfactor)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3d46f20..855091b 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -315,6 +315,7 @@ typedef struct GinOptions
int32 vl_len_; /* varlena header (do not touch directly!) */
bool useFastUpdate; /* use fast updates? */
int pendingListCleanupSize; /* maximum size of pending list */
+ int fillfactor; /* page fillfactor in percent (0..100) */
} GinOptions;
#define GIN_DEFAULT_USE_FASTUPDATE true
@@ -327,6 +328,8 @@ typedef struct GinOptions
((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
gin_pending_list_limit)
+#define GIN_MIN_FILLFACTOR 20
+#define GIN_DEFAULT_FILLFACTOR 100
/* Macros for buffer lock/unlock operations */
#define GIN_UNLOCK BUFFER_LOCK_UNLOCK
--
2.1.3
0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchapplication/x-patch; name=0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchDownload
From eda0730d991f8b4dfbacc4d7a953ec5bff8b2ffe Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Fri, 21 Nov 2014 13:40:11 +0900
Subject: [PATCH 1/2] Fix flag marking GIN index as being built for new entries
This was somewhat missing in the current implementation, and leaded
to problems for code that needed special handling with fresh indexes
being built. Note that this does not impact current code as there are
no such operations being done yet but it may be a problem if in the
future a bug fix needs to make this distinction.
---
src/backend/access/gin/gininsert.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c1ad0fd..c6d8b40 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -191,6 +191,7 @@ ginEntryInsert(GinState *ginstate,
buildStats->nEntries++;
ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
+ btree.isBuild = (buildStats != NULL);
stack = ginFindLeafPage(&btree, false);
page = BufferGetPage(stack->buffer);
--
2.1.3
On Fri, Nov 21, 2014 at 2:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I am adding that to the commit fest of December.
Here are updated patches. Alvaro notified me about an inconsistent comment.
--
Michael
Attachments:
0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-flag-marking-GIN-index-as-being-built-for-new-en.patchDownload
From eda0730d991f8b4dfbacc4d7a953ec5bff8b2ffe Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Fri, 21 Nov 2014 13:40:11 +0900
Subject: [PATCH 1/2] Fix flag marking GIN index as being built for new entries
This was somewhat missing in the current implementation, and leaded
to problems for code that needed special handling with fresh indexes
being built. Note that this does not impact current code as there are
no such operations being done yet but it may be a problem if in the
future a bug fix needs to make this distinction.
---
src/backend/access/gin/gininsert.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c1ad0fd..c6d8b40 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -191,6 +191,7 @@ ginEntryInsert(GinState *ginstate,
buildStats->nEntries++;
ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
+ btree.isBuild = (buildStats != NULL);
stack = ginFindLeafPage(&btree, false);
page = BufferGetPage(stack->buffer);
--
2.2.0
0002-Support-fillfactor-for-GIN-indexes.patchtext/x-patch; charset=US-ASCII; name=0002-Support-fillfactor-for-GIN-indexes.patchDownload
From eeb6f4d4cd4dcc713c7c4a6526fa9d4cf49c0262 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Fri, 21 Nov 2014 14:08:54 +0900
Subject: [PATCH 2/2] Support fillfactor for GIN indexes
Users can call this new storage parameter to fill in the entry and
leaf pages of a newly-built index as wanted. Fillfactor range varies
between 20 and 100.
---
doc/src/sgml/ref/create_index.sgml | 4 ++--
src/backend/access/common/reloptions.c | 9 +++++++++
src/backend/access/gin/gindatapage.c | 22 ++++++++++++++++++----
src/backend/access/gin/ginentrypage.c | 20 +++++++++++++++++++-
src/backend/access/gin/ginutil.c | 3 ++-
src/include/access/gin_private.h | 3 +++
6 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 6b2ee28..c0ba24a 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -294,8 +294,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
- storage parameters. The B-tree, hash, GiST and SP-GiST index methods all
- accept this parameter:
+ storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods
+ all accept this parameter:
</para>
<variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c16b38e..7137ba9 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -15,6 +15,7 @@
#include "postgres.h"
+#include "access/gin_private.h"
#include "access/gist_private.h"
#include "access/hash.h"
#include "access/htup_details.h"
@@ -133,6 +134,14 @@ static relopt_int intRelOpts[] =
},
{
{
+ "fillfactor",
+ "Packs gin index pages only to this percentage",
+ RELOPT_KIND_GIN
+ },
+ GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+ },
+ {
+ {
"autovacuum_vacuum_threshold",
"Minimum number of tuple updates or deletes prior to vacuum",
RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 012225e..f322004 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -446,11 +446,21 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
leafSegmentInfo *lastleftinfo;
ItemPointerData maxOldItem;
ItemPointerData remaining;
+ int fillfactor;
Assert(GinPageIsData(page));
rbound = *GinDataPageGetRightBound(page);
+ /* Grab option values */
+ if (btree->index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) btree->index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
+
/*
* Count how many of the new items belong to this page.
*/
@@ -511,15 +521,19 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
/*
* If we're appending to the end of the page, we will append as many items
- * as we can fit (after splitting), and stop when the pages becomes full.
- * Otherwise we have to limit the number of new items to insert, because
- * once we start packing we can't just stop when we run out of space,
- * because we must make sure that all the old items still fit.
+ * as we can fit up to the given fillfactor at build (after splitting),
+ * and stop when the pages becomes full at this rate. Otherwise we have to
+ * limit the number of new items to insert, because once we start packing
+ * we can't just stop when we run out of space, because we must make sure
+ * that all the old items still fit.
*/
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
+ else if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;
else
freespace = 0;
+
if (append)
{
/*
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 2dae7b9..36eccd7 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -462,6 +462,16 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
Size releasedsz = 0;
Size addedsz;
Page page = BufferGetPage(buf);
+ int fillfactor;
+ int freespace;
+
+ if (btree->index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) btree->index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
Assert(insertData->entry);
Assert(!GinPageIsData(page));
@@ -475,7 +485,15 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
addedsz = MAXALIGN(IndexTupleSize(insertData->entry)) + sizeof(ItemIdData);
- if (PageGetFreeSpace(page) + releasedsz >= addedsz)
+ /*
+ * Calculate freespace available. When building the index take into
+ * account the fillfactor.
+ */
+ if (btree->isBuild)
+ freespace = PageGetFreeSpace(page) - BLCKSZ * (100 - fillfactor) / 100;
+ else
+ freespace = PageGetFreeSpace(page);
+ if (freespace + releasedsz >= addedsz)
return true;
return false;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index f593a72..bfdfd0c 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -527,7 +527,8 @@ ginoptions(PG_FUNCTION_ARGS)
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
- pendingListCleanupSize)}
+ pendingListCleanupSize)},
+ {"fillfactor", RELOPT_TYPE_INT, offsetof(GinOptions, fillfactor)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3d46f20..0a20749 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -315,6 +315,7 @@ typedef struct GinOptions
int32 vl_len_; /* varlena header (do not touch directly!) */
bool useFastUpdate; /* use fast updates? */
int pendingListCleanupSize; /* maximum size of pending list */
+ int fillfactor; /* page fillfactor in percent (20..100) */
} GinOptions;
#define GIN_DEFAULT_USE_FASTUPDATE true
@@ -327,6 +328,8 @@ typedef struct GinOptions
((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
gin_pending_list_limit)
+#define GIN_MIN_FILLFACTOR 20
+#define GIN_DEFAULT_FILLFACTOR 100
/* Macros for buffer lock/unlock operations */
#define GIN_UNLOCK BUFFER_LOCK_UNLOCK
--
2.2.0
Le jeudi 27 novembre 2014 23:33:11 Michael Paquier a écrit :
On Fri, Nov 21, 2014 at 2:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I am adding that to the commit fest of December.
Here are updated patches. Alvaro notified me about an inconsistent
comment.
what are the benefits of this patch ? (maybe you had some test case or a
benchmark ?)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Hi!
On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.
That's not true. Let us discuss it a little bit.
In btree access method index is created by sorting index tuples. Once they
are sorted it can write a tree with any fullness of the pages. With
completely filled pages you will get flood of index page splits on table
updates or inserts. In order to avoid that the default fillfactor is 90.
Besides index creation, btree access method tries to follow given
fillfactor in the case of inserting ordered data. When rightmost page
splits then fillfactor percent of data goes to the left page.
Btree page life cycle is between being freshly branched by split (50%
filled) and being full packed (100% filled) and then splitted. Thus we can
roughly estimate average fullness of btree page to be 75%. This
"equilibrium" state can be achieved by huge amount of inserts over btree
index.
Fillfactor was spreaded to other access methods. For instance, GiST. In
GiST, tree is always constructed by page splits. So, freshly created GiST
is already in its "equilibrium" state. Even more GiST doesn't splits page
by halves. Split ration depends on opclass. So, "equilibrium" fullness of
particular GiST index is even less than 75%. What does fillfactor do with
GiST? It makes GiST pages split on fillfactor fullness on index build. So,
GiST is even less fulled. But why? You anyway don't get flood of split on
fresh GiST index because it's in "equilibrium" state. That's why I would
rather delete fillfactor from GiST until we have some algorithm to
construct GiST without page splits.
What is going on with GIN? GIN is, basically, two-level nested btree. So,
fillfactor should be applicable here. But GIN is not constructed like
btree...
GIN accumulates maintenance_work_mem of data and then inserts it into the
tree. So fresh GIN is the result of insertion of maintenance_work_mem
buckets. And each bucket is sorted. On small index (or large
maintenance_work_mem) it would be the only one bucket. GIN has two levels
of btree: entry tree and posting tree. Currently entry tree has on special
logic to control fullness during index build. So, with only one bucket
entry tree will be 50% filled. With many buckets entry tree will be at
"equilibrium" state. In some specific cases (about two buckets) entry tree
can be almost fully packed. Insertion into posting tree is always ordered
during index build because posting tree is a tree of TIDs. When posting
tree page splits it leaves left page fully packed during index build and
75% packed during insertion (because it expects some sequential inserts).
Idea of GIN building algorithm is that entry tree is expected to be
relatively small and fit to cache. Insertions into posting trees are
orders. Thus this algorithm should be IO efficient and have low overhead.
However, with large entry trees this algorithm can perform much worse.
My summary is following:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.
3) You definitely need to touch code that selects ratio of split in
dataPlaceToPageLeaf (starting with if (!btree->isBuild)).
3) GIN data pages are always compressed excepts pg_upgraded indexes from
pre-9.4. Take care about it in following code.
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
+ else if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;
------
With best regards,
Alexander Korotkov.
On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.That's not true. Let us discuss it a little bit.
In btree access method index is created by sorting index tuples. Once they
are sorted it can write a tree with any fullness of the pages. With
completely filled pages you will get flood of index page splits on table
updates or inserts. In order to avoid that the default fillfactor is 90.
Besides index creation, btree access method tries to follow given fillfactor
in the case of inserting ordered data. When rightmost page splits then
fillfactor percent of data goes to the left page.
Btree page life cycle is between being freshly branched by split (50%
filled) and being full packed (100% filled) and then splitted. Thus we can
roughly estimate average fullness of btree page to be 75%. This
"equilibrium" state can be achieved by huge amount of inserts over btree
index.Fillfactor was spreaded to other access methods. For instance, GiST. In
GiST, tree is always constructed by page splits. So, freshly created GiST is
already in its "equilibrium" state. Even more GiST doesn't splits page by
halves. Split ration depends on opclass. So, "equilibrium" fullness of
particular GiST index is even less than 75%. What does fillfactor do with
GiST? It makes GiST pages split on fillfactor fullness on index build. So,
GiST is even less fulled. But why? You anyway don't get flood of split on
fresh GiST index because it's in "equilibrium" state. That's why I would
rather delete fillfactor from GiST until we have some algorithm to construct
GiST without page splits.What is going on with GIN? GIN is, basically, two-level nested btree. So,
fillfactor should be applicable here. But GIN is not constructed like
btree...
GIN accumulates maintenance_work_mem of data and then inserts it into the
tree. So fresh GIN is the result of insertion of maintenance_work_mem
buckets. And each bucket is sorted. On small index (or large
maintenance_work_mem) it would be the only one bucket. GIN has two levels of
btree: entry tree and posting tree. Currently entry tree has on special
logic to control fullness during index build. So, with only one bucket entry
tree will be 50% filled. With many buckets entry tree will be at
"equilibrium" state. In some specific cases (about two buckets) entry tree
can be almost fully packed. Insertion into posting tree is always ordered
during index build because posting tree is a tree of TIDs. When posting tree
page splits it leaves left page fully packed during index build and 75%
packed during insertion (because it expects some sequential inserts).Idea of GIN building algorithm is that entry tree is expected to be
relatively small and fit to cache. Insertions into posting trees are orders.
Thus this algorithm should be IO efficient and have low overhead. However,
with large entry trees this algorithm can perform much worse.My summary is following: 1) In order to have fully correct support of fillfactor in GIN we need to rewrite GIN build algorithm. 2) Without rewriting GIN build algorithm, not much can be done with entry tree. However, you can implement some heuristics. 3) You definitely need to touch code that selects ratio of split in dataPlaceToPageLeaf (starting with if (!btree->isBuild)). 3) GIN data pages are always compressed excepts pg_upgraded indexes from pre-9.4. Take care about it in following code. if (GinPageIsCompressed(page)) freespace = GinDataLeafPageGetFreeSpace(page); + else if (btree->isBuild) + freespace = BLCKSZ * (100 - fillfactor) / 100;
This is a very interesting explanation; thanks for writing it up!
It does leave me wondering why anyone would want fillfactor for GIN at
all, even if they were willing to rewrite the build algorithm.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:Please find attached a simple patch adding fillfactor as storage parameter
for GIN indexes. The default value is the same as the one currently aka 100
to have the pages completely packed when a GIN index is created.That's not true. Let us discuss it a little bit.
[blah discussion]
That's quite a nice explanation. Thanks!
My summary is following:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.
TBH, I am not really planning to rewrite the whole code.
3) You definitely need to touch code that selects ratio of split in
dataPlaceToPageLeaf (starting with if (!btree->isBuild)).
OK I see, so for a split we need to have a calculation based on the
fillfactor, with 75% by default.
4) GIN data pages are always compressed excepts pg_upgraded indexes from pre-9.4. Take care about it in following code. if (GinPageIsCompressed(page)) freespace = GinDataLeafPageGetFreeSpace(page); + else if (btree->isBuild) + freespace = BLCKSZ * (100 - fillfactor) / 100;
Hm. Simply reversing both conditions is fine, no?
This is a very interesting explanation; thanks for writing it up!
It does leave me wondering why anyone would want fillfactor for GIN at
all, even if they were willing to rewrite the build algorithm.
Based on the explanation of Alexander, the current GIN code fills in a
page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall
correctly. IMO a higher fillfactor makes sense for a GIN index that
gets less random updates, no?
I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?
--
Michael
Attachments:
20150107_gin_fillfactor_v2.patchapplication/x-patch; name=20150107_gin_fillfactor_v2.patchDownload
From 39c6cab320f648b61cc65654ae67e62d8926bfa1 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@otacoo.com>
Date: Fri, 21 Nov 2014 14:08:54 +0900
Subject: [PATCH] Support fillfactor for GIN indexes
Users can call this new storage parameter to fill in the entry and
leaf pages of a newly-built index as wanted. Fillfactor range varies
between 20 and 100.
---
doc/src/sgml/ref/create_index.sgml | 4 ++--
src/backend/access/common/reloptions.c | 9 +++++++++
src/backend/access/gin/gindatapage.c | 34 ++++++++++++++++++++++++----------
src/backend/access/gin/ginentrypage.c | 20 +++++++++++++++++++-
src/backend/access/gin/ginutil.c | 3 ++-
src/include/access/gin_private.h | 3 +++
6 files changed, 59 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 6b2ee28..c0ba24a 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -294,8 +294,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
- storage parameters. The B-tree, hash, GiST and SP-GiST index methods all
- accept this parameter:
+ storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods
+ all accept this parameter:
</para>
<variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index f008fab..418ecec 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -15,6 +15,7 @@
#include "postgres.h"
+#include "access/gin_private.h"
#include "access/gist_private.h"
#include "access/hash.h"
#include "access/htup_details.h"
@@ -133,6 +134,14 @@ static relopt_int intRelOpts[] =
},
{
{
+ "fillfactor",
+ "Packs gin index pages only to this percentage",
+ RELOPT_KIND_GIN
+ },
+ GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+ },
+ {
+ {
"autovacuum_vacuum_threshold",
"Minimum number of tuple updates or deletes prior to vacuum",
RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 8e81f6c..eebfa62 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -446,11 +446,21 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
leafSegmentInfo *lastleftinfo;
ItemPointerData maxOldItem;
ItemPointerData remaining;
+ int fillfactor;
Assert(GinPageIsData(page));
rbound = *GinDataPageGetRightBound(page);
+ /* Grab option values */
+ if (btree->index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) btree->index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
+
/*
* Count how many of the new items belong to this page.
*/
@@ -511,15 +521,19 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
/*
* If we're appending to the end of the page, we will append as many items
- * as we can fit (after splitting), and stop when the pages becomes full.
- * Otherwise we have to limit the number of new items to insert, because
- * once we start packing we can't just stop when we run out of space,
- * because we must make sure that all the old items still fit.
+ * as we can fit up to the given fillfactor at build (after splitting),
+ * and stop when the pages becomes full at this rate. Otherwise we have to
+ * limit the number of new items to insert, because once we start packing
+ * we can't just stop when we run out of space, because we must make sure
+ * that all the old items still fit.
*/
- if (GinPageIsCompressed(page))
+ if (btree->isBuild)
+ freespace = BLCKSZ * (100 - fillfactor) / 100;
+ else if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
else
freespace = 0;
+
if (append)
{
/*
@@ -628,10 +642,10 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
* until they're balanced.
*
* As a further heuristic, when appending items to the end of the
- * page, try make the left page 75% full, one the assumption that
- * subsequent insertions will probably also go to the end. This packs
- * the index somewhat tighter when appending to a table, which is very
- * common.
+ * page, try make the left page full at the rate of fillfactor, one
+ * the assumption that subsequent insertions will probably also go
+ * to the end. This packs the index somewhat tighter when appending
+ * to a table, which is very common.
*/
if (!btree->isBuild)
{
@@ -654,7 +668,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
break;
if (append)
{
- if ((leaf->lsize - segsize) < (BLCKSZ * 3) / 4)
+ if ((leaf->lsize - segsize) < (BLCKSZ * fillfactor) / 100)
break;
}
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 9997eae..7bd5b30 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -462,6 +462,16 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
Size releasedsz = 0;
Size addedsz;
Page page = BufferGetPage(buf);
+ int fillfactor;
+ int freespace;
+
+ if (btree->index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) btree->index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
Assert(insertData->entry);
Assert(!GinPageIsData(page));
@@ -475,7 +485,15 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
addedsz = MAXALIGN(IndexTupleSize(insertData->entry)) + sizeof(ItemIdData);
- if (PageGetFreeSpace(page) + releasedsz >= addedsz)
+ /*
+ * Calculate freespace available. When building the index take into
+ * account the fillfactor.
+ */
+ if (btree->isBuild)
+ freespace = PageGetFreeSpace(page) - BLCKSZ * (100 - fillfactor) / 100;
+ else
+ freespace = PageGetFreeSpace(page);
+ if (freespace + releasedsz >= addedsz)
return true;
return false;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 445466b..1f7835e 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -527,7 +527,8 @@ ginoptions(PG_FUNCTION_ARGS)
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
- pendingListCleanupSize)}
+ pendingListCleanupSize)},
+ {"fillfactor", RELOPT_TYPE_INT, offsetof(GinOptions, fillfactor)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index cf2ef80..43911dd 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -323,6 +323,7 @@ typedef struct GinOptions
int32 vl_len_; /* varlena header (do not touch directly!) */
bool useFastUpdate; /* use fast updates? */
int pendingListCleanupSize; /* maximum size of pending list */
+ int fillfactor; /* page fillfactor in percent (20..100) */
} GinOptions;
#define GIN_DEFAULT_USE_FASTUPDATE true
@@ -335,6 +336,8 @@ typedef struct GinOptions
((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
gin_pending_list_limit)
+#define GIN_MIN_FILLFACTOR 20
+#define GIN_DEFAULT_FILLFACTOR 75
/* Macros for buffer lock/unlock operations */
#define GIN_UNLOCK BUFFER_LOCK_UNLOCK
--
2.2.1
On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier <
michael.paquier@gmail.com>
wrote:
Please find attached a simple patch adding fillfactor as storage
parameter
for GIN indexes. The default value is the same as the one currently
aka 100
to have the pages completely packed when a GIN index is created.
That's not true. Let us discuss it a little bit.
[blah discussion]That's quite a nice explanation. Thanks!
My summary is following:
1) In order to have fully correct support of fillfactor in GIN we needto
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done withentry
tree. However, you can implement some heuristics.
TBH, I am not really planning to rewrite the whole code.
3) You definitely need to touch code that selects ratio of split in
dataPlaceToPageLeaf (starting with if (!btree->isBuild)).OK I see, so for a split we need to have a calculation based on the
fillfactor, with 75% by default.4) GIN data pages are always compressed excepts pg_upgraded indexes from pre-9.4. Take care about it in following code. if (GinPageIsCompressed(page)) freespace = GinDataLeafPageGetFreeSpace(page); + else if (btree->isBuild) + freespace = BLCKSZ * (100 - fillfactor) / 100;Hm. Simply reversing both conditions is fine, no?
This is a very interesting explanation; thanks for writing it up!
It does leave me wondering why anyone would want fillfactor for GIN at
all, even if they were willing to rewrite the build algorithm.Based on the explanation of Alexander, the current GIN code fills in a
page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall
correctly. IMO a higher fillfactor makes sense for a GIN index that
gets less random updates, no?I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?
Rewritten version of patch is attached. I made following changes:
1) I removed fillfactor handling from entry tree. Because in this case
fillfactor parameter would be only upper bound for actual page fullness.
It's very like GiST approach to fillfactor. But I would rather remove
fillfactor from GiST than spread such approach to other AMs.
2) Fillfactor handling for posting trees is fixed.
3) Default fillfactor for GIN is reverted to 90. I didn't mean that default
fillfactor should be 75%. I mean that equilibrium state of tree would be
about 75% fullness. But that doesn't mean that we don't want indexes to be
better packed just after creation. Anyway I didn't see reason why default
fillfactor for GIN btree should differs from default fillfactor of regular
btree.
------
With best regards,
Alexander Korotkov.
Attachments:
gin_fillfactor_3.patchapplication/octet-stream; name=gin_fillfactor_3.patchDownload
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 6b2ee28..c0ba24a
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 294,301 ****
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GiST and SP-GiST index methods all
! accept this parameter:
</para>
<variablelist>
--- 294,301 ----
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods
! all accept this parameter:
</para>
<variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
new file mode 100644
index f008fab..418ecec
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 15,20 ****
--- 15,21 ----
#include "postgres.h"
+ #include "access/gin_private.h"
#include "access/gist_private.h"
#include "access/hash.h"
#include "access/htup_details.h"
*************** static relopt_int intRelOpts[] =
*** 133,138 ****
--- 134,147 ----
},
{
{
+ "fillfactor",
+ "Packs gin index pages only to this percentage",
+ RELOPT_KIND_GIN
+ },
+ GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+ },
+ {
+ {
"autovacuum_vacuum_threshold",
"Minimum number of tuple updates or deletes prior to vacuum",
RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
new file mode 100644
index 8e81f6c..25efdb1
*** a/src/backend/access/gin/gindatapage.c
--- b/src/backend/access/gin/gindatapage.c
*************** typedef struct
*** 55,60 ****
--- 55,62 ----
dlist_node *lastleft; /* last segment on left page */
int lsize; /* total size on left page */
int rsize; /* total size on right page */
+ int maxdatasize; /* max data size per page
+ according to fillfactor */
bool oldformat; /* page is in pre-9.4 format on disk */
} disassembledLeaf;
*************** GinPageDeletePostingItem(Page page, Offs
*** 423,428 ****
--- 425,452 ----
}
/*
+ * Returns max data size on the page according to index fillfactor.
+ */
+ int
+ GinGetMaxDataSize(Relation index)
+ {
+ int fillfactor;
+
+ /* Grab option values */
+ if (index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ {
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
+ }
+
+ return GinDataPageMaxDataSize - BLCKSZ * (100 - fillfactor) / 100;
+ }
+
+ /*
* Places keys to leaf data page and fills WAL record.
*/
static GinPlaceToPageRC
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 485,490 ****
--- 509,515 ----
oldCxt = MemoryContextSwitchTo(tmpCxt);
leaf = disassembleLeaf(page);
+ leaf->maxdatasize = GinGetMaxDataSize(btree->index);
/*
* Are we appending to the end of the page? IOW, are all the new items
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 511,525 ****
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit (after splitting), and stop when the pages becomes full.
! * Otherwise we have to limit the number of new items to insert, because
! * once we start packing we can't just stop when we run out of space,
! * because we must make sure that all the old items still fit.
*/
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
else
freespace = 0;
if (append)
{
/*
--- 536,561 ----
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit up to the given fillfactor at build (after splitting),
! * and stop when the pages becomes full at this rate. Otherwise we have to
! * limit the number of new items to insert, because once we start packing
! * we can't just stop when we run out of space, because we must make sure
! * that all the old items still fit.
*/
if (GinPageIsCompressed(page))
+ {
freespace = GinDataLeafPageGetFreeSpace(page);
+ if (btree->isBuild)
+ {
+ freespace -= GinDataPageMaxDataSize - leaf->maxdatasize;
+ freespace = Max(0, freespace);
+ }
+ }
else
+ {
freespace = 0;
+ }
+
if (append)
{
/*
*************** disassembleLeaf(Page page)
*** 1280,1285 ****
--- 1316,1322 ----
Pointer segend;
leaf = palloc0(sizeof(disassembledLeaf));
+ leaf->maxdatasize = GinDataPageMaxDataSize;
dlist_init(&leaf->segments);
if (GinPageIsCompressed(page))
*************** leafRepackItems(disassembledLeaf *leaf,
*** 1591,1597 ****
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > GinDataPageMaxDataSize)
{
if (!needsplit)
{
--- 1628,1634 ----
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > leaf->maxdatasize)
{
if (!needsplit)
{
*************** createPostingTree(Relation index, ItemPo
*** 1683,1688 ****
--- 1720,1726 ----
Pointer ptr;
int nrootitems;
int rootsize;
+ int maxdatasize;
/* Construct the new root page in memory first. */
tmppage = (Page) palloc(BLCKSZ);
*************** createPostingTree(Relation index, ItemPo
*** 1695,1700 ****
--- 1733,1739 ----
*/
nrootitems = 0;
rootsize = 0;
+ maxdatasize = GinGetMaxDataSize(index);
ptr = (Pointer) GinDataLeafPageGetPostingList(tmppage);
while (nrootitems < nitems)
{
*************** createPostingTree(Relation index, ItemPo
*** 1707,1713 ****
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > GinDataPageMaxDataSize)
break;
memcpy(ptr, segment, segsize);
--- 1746,1752 ----
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > maxdatasize)
break;
memcpy(ptr, segment, segsize);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
new file mode 100644
index 445466b..1f7835e
*** a/src/backend/access/gin/ginutil.c
--- b/src/backend/access/gin/ginutil.c
*************** ginoptions(PG_FUNCTION_ARGS)
*** 527,533 ****
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
--- 527,534 ----
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)},
! {"fillfactor", RELOPT_TYPE_INT, offsetof(GinOptions, fillfactor)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
new file mode 100644
index cf2ef80..10420c5
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
*************** typedef struct GinOptions
*** 323,328 ****
--- 323,329 ----
int32 vl_len_; /* varlena header (do not touch directly!) */
bool useFastUpdate; /* use fast updates? */
int pendingListCleanupSize; /* maximum size of pending list */
+ int fillfactor; /* page fillfactor in percent (20..100) */
} GinOptions;
#define GIN_DEFAULT_USE_FASTUPDATE true
*************** typedef struct GinOptions
*** 335,340 ****
--- 336,343 ----
((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
gin_pending_list_limit)
+ #define GIN_MIN_FILLFACTOR 20
+ #define GIN_DEFAULT_FILLFACTOR 90
/* Macros for buffer lock/unlock operations */
#define GIN_UNLOCK BUFFER_LOCK_UNLOCK
*************** extern BlockNumber createPostingTree(Rel
*** 752,757 ****
--- 755,761 ----
GinStatsData *buildStats);
extern void GinDataPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset);
extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
+ extern int GinGetMaxDataSize(Relation index);
extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats);
On Thu, Jan 8, 2015 at 12:31 AM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:
Rewritten version of patch is attached. I made following changes:
1) I removed fillfactor handling from entry tree. Because in this case
fillfactor parameter would be only upper bound for actual page fullness.
It's very like GiST approach to fillfactor. But I would rather remove
fillfactor from GiST than spread such approach to other AMs.
2) Fillfactor handling for posting trees is fixed.
3) Default fillfactor for GIN is reverted to 90. I didn't mean that
default fillfactor should be 75%. I mean that equilibrium state of tree
would be about 75% fullness. But that doesn't mean that we don't want
indexes to be better packed just after creation. Anyway I didn't see reason
why default fillfactor for GIN btree should differs from default fillfactor
of regular btree.
Just forgot simple case that I used to test the patch.
create table test as (select array[1,2,3] v from
generate_series(1,1000000));
create index test_idx20 on test using gin(v) with (fillfactor=20);
create index test_idx30 on test using gin(v) with (fillfactor=30);
create index test_idx40 on test using gin(v) with (fillfactor=40);
create index test_idx50 on test using gin(v) with (fillfactor=50);
create index test_idx60 on test using gin(v) with (fillfactor=60);
create index test_idx70 on test using gin(v) with (fillfactor=70);
create index test_idx80 on test using gin(v) with (fillfactor=80);
create index test_idx90 on test using gin(v) with (fillfactor=90);
create index test_idx100 on test using gin(v) with (fillfactor=100);
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------+-------+--------+-------+---------+-------------
public | test_idx20 | index | smagen | test | 16 MB |
public | test_idx30 | index | smagen | test | 11 MB |
public | test_idx40 | index | smagen | test | 8152 kB |
public | test_idx50 | index | smagen | test | 6520 kB |
public | test_idx60 | index | smagen | test | 5176 kB |
public | test_idx70 | index | smagen | test | 4480 kB |
public | test_idx80 | index | smagen | test | 3928 kB |
public | test_idx90 | index | smagen | test | 3520 kB |
public | test_idx100 | index | smagen | test | 3184 kB |
(9 rows)
------
With best regards,
Alexander Korotkov.
On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <michael.paquier@gmail.com>
I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?Rewritten version of patch is attached. I made following changes:
Thanks! With this patch (and my previous version as well) GIN indexes
with default fillfactor have a size higher than 9.4 indexes, 9.4
behavior being consistent only with fillfactor=100 and not the default
of 90. Are we fine with that?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <michael.paquier@gmail.com>
I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?Rewritten version of patch is attached. I made following changes:
Thanks! With this patch (and my previous version as well) GIN indexes
with default fillfactor have a size higher than 9.4 indexes, 9.4
behavior being consistent only with fillfactor=100 and not the default
of 90. Are we fine with that?
IMO, this patch has value to control random updates on GIN indexes,
but we should have a default fillfactor of 100 to have index size
consistent with 9.4. Thoughts?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 15, 2015 at 10:19 AM, Michael Paquier <michael.paquier@gmail.com
wrote:
On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <
michael.paquier@gmail.com>
I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?Rewritten version of patch is attached. I made following changes:
Thanks! With this patch (and my previous version as well) GIN indexes
with default fillfactor have a size higher than 9.4 indexes, 9.4
behavior being consistent only with fillfactor=100 and not the default
of 90. Are we fine with that?IMO, this patch has value to control random updates on GIN indexes,
but we should have a default fillfactor of 100 to have index size
consistent with 9.4. Thoughts?
I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a
final decision.
------
With best regards,
Alexander Korotkov.
Alexander Korotkov wrote:
I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a final
decision.
OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Alexander Korotkov wrote:
I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a final
decision.OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.
I'm struggling to understand why we shouldn't just reject this patch.
On November 27th, Cedric said:
"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"
Nobody replied. On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.
There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 17, 2015 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?
Indeed. There is no such benchmark as of now, and I am not sure I'll
get the time to work more on that soon, so let's reject the patch for
now. And sorry for the useless noise.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 16, 2015 at 8:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Alexander Korotkov wrote:
I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make afinal
decision.
OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.I'm struggling to understand why we shouldn't just reject this patch.
On November 27th, Cedric said:"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"Nobody replied. On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?
I already wrote quite detailed explanation of subject. Let mel try to
explain in shortly. GIN is two level nested btree. Thus, GIN would have
absolutely same benefits from fillfactor as btree. Lack of tests showing it
is, for sure, fault.
However, GIN posting trees are ordered by ItemPointer and this makes some
specific. If you have freshly created table and do inserts/updates they
would use the end of heap. Thus, inserts would go to the end of GIN posting
tree and fillfactor wouldn't affect anything. Fillfactor would give
benefits on HOT or heap space re-usage.
In the following example you can see that index size was increased greatly
while updating every 20th row. It's because every update causes page split
in index.
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100,
fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3184 kB* │
# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *5264 kB* │
(1 row)
But if we create index with fillfactor=90, index size would remain the
same: no page splits.
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx90 on test using gin(v) with (fillfactor=90,
fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ *3520 kB* │
# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ *3520 kB* │
(1 row)
Similar situation would be if we use fastupdate. But fastupdate takes some
space for pending lists which is independent from fillfactor.
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100,
fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3184 kB* │
# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *7256 kB* │
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=90,
fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3520 kB* │
# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *5512 kB* │
BTW, previous version of patch contained some bugs. Revised version is
attached.
------
With best regards,
Alexander Korotkov.
Attachments:
gin_fillfactor_4.patchapplication/octet-stream; name=gin_fillfactor_4.patchDownload
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 6b2ee28..c0ba24a
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 294,301 ****
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GiST and SP-GiST index methods all
! accept this parameter:
</para>
<variablelist>
--- 294,301 ----
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods
! all accept this parameter:
</para>
<variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
new file mode 100644
index f008fab..418ecec
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 15,20 ****
--- 15,21 ----
#include "postgres.h"
+ #include "access/gin_private.h"
#include "access/gist_private.h"
#include "access/hash.h"
#include "access/htup_details.h"
*************** static relopt_int intRelOpts[] =
*** 133,138 ****
--- 134,147 ----
},
{
{
+ "fillfactor",
+ "Packs gin index pages only to this percentage",
+ RELOPT_KIND_GIN
+ },
+ GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+ },
+ {
+ {
"autovacuum_vacuum_threshold",
"Minimum number of tuple updates or deletes prior to vacuum",
RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
new file mode 100644
index 8e81f6c..25efdb1
*** a/src/backend/access/gin/gindatapage.c
--- b/src/backend/access/gin/gindatapage.c
*************** typedef struct
*** 55,60 ****
--- 55,62 ----
dlist_node *lastleft; /* last segment on left page */
int lsize; /* total size on left page */
int rsize; /* total size on right page */
+ int maxdatasize; /* max data size per page
+ according to fillfactor */
bool oldformat; /* page is in pre-9.4 format on disk */
} disassembledLeaf;
*************** GinPageDeletePostingItem(Page page, Offs
*** 423,428 ****
--- 425,452 ----
}
/*
+ * Returns max data size on the page according to index fillfactor.
+ */
+ int
+ GinGetMaxDataSize(Relation index)
+ {
+ int fillfactor;
+
+ /* Grab option values */
+ if (index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ {
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
+ }
+
+ return GinDataPageMaxDataSize - BLCKSZ * (100 - fillfactor) / 100;
+ }
+
+ /*
* Places keys to leaf data page and fills WAL record.
*/
static GinPlaceToPageRC
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 485,490 ****
--- 509,515 ----
oldCxt = MemoryContextSwitchTo(tmpCxt);
leaf = disassembleLeaf(page);
+ leaf->maxdatasize = GinGetMaxDataSize(btree->index);
/*
* Are we appending to the end of the page? IOW, are all the new items
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 511,525 ****
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit (after splitting), and stop when the pages becomes full.
! * Otherwise we have to limit the number of new items to insert, because
! * once we start packing we can't just stop when we run out of space,
! * because we must make sure that all the old items still fit.
*/
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
else
freespace = 0;
if (append)
{
/*
--- 536,561 ----
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit up to the given fillfactor at build (after splitting),
! * and stop when the pages becomes full at this rate. Otherwise we have to
! * limit the number of new items to insert, because once we start packing
! * we can't just stop when we run out of space, because we must make sure
! * that all the old items still fit.
*/
if (GinPageIsCompressed(page))
+ {
freespace = GinDataLeafPageGetFreeSpace(page);
+ if (btree->isBuild)
+ {
+ freespace -= GinDataPageMaxDataSize - leaf->maxdatasize;
+ freespace = Max(0, freespace);
+ }
+ }
else
+ {
freespace = 0;
+ }
+
if (append)
{
/*
*************** disassembleLeaf(Page page)
*** 1280,1285 ****
--- 1316,1322 ----
Pointer segend;
leaf = palloc0(sizeof(disassembledLeaf));
+ leaf->maxdatasize = GinDataPageMaxDataSize;
dlist_init(&leaf->segments);
if (GinPageIsCompressed(page))
*************** leafRepackItems(disassembledLeaf *leaf,
*** 1591,1597 ****
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > GinDataPageMaxDataSize)
{
if (!needsplit)
{
--- 1628,1634 ----
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > leaf->maxdatasize)
{
if (!needsplit)
{
*************** createPostingTree(Relation index, ItemPo
*** 1683,1688 ****
--- 1720,1726 ----
Pointer ptr;
int nrootitems;
int rootsize;
+ int maxdatasize;
/* Construct the new root page in memory first. */
tmppage = (Page) palloc(BLCKSZ);
*************** createPostingTree(Relation index, ItemPo
*** 1695,1700 ****
--- 1733,1739 ----
*/
nrootitems = 0;
rootsize = 0;
+ maxdatasize = GinGetMaxDataSize(index);
ptr = (Pointer) GinDataLeafPageGetPostingList(tmppage);
while (nrootitems < nitems)
{
*************** createPostingTree(Relation index, ItemPo
*** 1707,1713 ****
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > GinDataPageMaxDataSize)
break;
memcpy(ptr, segment, segsize);
--- 1746,1752 ----
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > maxdatasize)
break;
memcpy(ptr, segment, segsize);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
new file mode 100644
index 445466b..1f7835e
*** a/src/backend/access/gin/ginutil.c
--- b/src/backend/access/gin/ginutil.c
*************** ginoptions(PG_FUNCTION_ARGS)
*** 527,533 ****
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
--- 527,534 ----
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)},
! {"fillfactor", RELOPT_TYPE_INT, offsetof(GinOptions, fillfactor)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
new file mode 100644
index cf2ef80..10420c5
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
*************** typedef struct GinOptions
*** 323,328 ****
--- 323,329 ----
int32 vl_len_; /* varlena header (do not touch directly!) */
bool useFastUpdate; /* use fast updates? */
int pendingListCleanupSize; /* maximum size of pending list */
+ int fillfactor; /* page fillfactor in percent (20..100) */
} GinOptions;
#define GIN_DEFAULT_USE_FASTUPDATE true
*************** typedef struct GinOptions
*** 335,340 ****
--- 336,343 ----
((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
gin_pending_list_limit)
+ #define GIN_MIN_FILLFACTOR 20
+ #define GIN_DEFAULT_FILLFACTOR 90
/* Macros for buffer lock/unlock operations */
#define GIN_UNLOCK BUFFER_LOCK_UNLOCK
*************** extern BlockNumber createPostingTree(Rel
*** 752,757 ****
--- 755,761 ----
GinStatsData *buildStats);
extern void GinDataPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset);
extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
+ extern int GinGetMaxDataSize(Relation index);
extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats);
On Sat, Jan 17, 2015 at 12:49 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:
BTW, previous version of patch contained some bugs. Revised version is
attached.
Ooops, it's here.
------
With best regards,
Alexander Korotkov.
Attachments:
gin_fillfactor_5.patchapplication/octet-stream; name=gin_fillfactor_5.patchDownload
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 6b2ee28..c0ba24a
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 294,301 ****
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GiST and SP-GiST index methods all
! accept this parameter:
</para>
<variablelist>
--- 294,301 ----
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods
! all accept this parameter:
</para>
<variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
new file mode 100644
index f008fab..418ecec
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 15,20 ****
--- 15,21 ----
#include "postgres.h"
+ #include "access/gin_private.h"
#include "access/gist_private.h"
#include "access/hash.h"
#include "access/htup_details.h"
*************** static relopt_int intRelOpts[] =
*** 133,138 ****
--- 134,147 ----
},
{
{
+ "fillfactor",
+ "Packs gin index pages only to this percentage",
+ RELOPT_KIND_GIN
+ },
+ GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+ },
+ {
+ {
"autovacuum_vacuum_threshold",
"Minimum number of tuple updates or deletes prior to vacuum",
RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
new file mode 100644
index 8e81f6c..742e114
*** a/src/backend/access/gin/gindatapage.c
--- b/src/backend/access/gin/gindatapage.c
*************** typedef struct
*** 55,60 ****
--- 55,62 ----
dlist_node *lastleft; /* last segment on left page */
int lsize; /* total size on left page */
int rsize; /* total size on right page */
+ int maxdatasize; /* max data size per page
+ according to fillfactor */
bool oldformat; /* page is in pre-9.4 format on disk */
} disassembledLeaf;
*************** GinPageDeletePostingItem(Page page, Offs
*** 423,428 ****
--- 425,456 ----
}
/*
+ * Returns max data size on the page according to index fillfactor.
+ */
+ int
+ GinGetMaxDataSize(Relation index, bool isBuild)
+ {
+ int fillfactor;
+
+ /* Grab option value which affects only index build */
+ if (!isBuild)
+ {
+ fillfactor = 100;
+ }
+ else if (index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ {
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
+ }
+
+ return GinDataPageMaxDataSize - BLCKSZ * (100 - fillfactor) / 100;
+ }
+
+ /*
* Places keys to leaf data page and fills WAL record.
*/
static GinPlaceToPageRC
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 486,491 ****
--- 514,521 ----
leaf = disassembleLeaf(page);
+ leaf->maxdatasize = GinGetMaxDataSize(btree->index, btree->isBuild);
+
/*
* Are we appending to the end of the page? IOW, are all the new items
* larger than any of the existing items.
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 511,525 ****
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit (after splitting), and stop when the pages becomes full.
! * Otherwise we have to limit the number of new items to insert, because
! * once we start packing we can't just stop when we run out of space,
! * because we must make sure that all the old items still fit.
*/
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
else
freespace = 0;
if (append)
{
/*
--- 541,566 ----
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit up to the given fillfactor at build (after splitting),
! * and stop when the pages becomes full at this rate. Otherwise we have to
! * limit the number of new items to insert, because once we start packing
! * we can't just stop when we run out of space, because we must make sure
! * that all the old items still fit.
*/
if (GinPageIsCompressed(page))
+ {
freespace = GinDataLeafPageGetFreeSpace(page);
+ if (btree->isBuild)
+ {
+ freespace -= GinDataPageMaxDataSize - leaf->maxdatasize;
+ freespace = Max(0, freespace);
+ }
+ }
else
+ {
freespace = 0;
+ }
+
if (append)
{
/*
*************** disassembleLeaf(Page page)
*** 1280,1285 ****
--- 1321,1327 ----
Pointer segend;
leaf = palloc0(sizeof(disassembledLeaf));
+ leaf->maxdatasize = GinDataPageMaxDataSize;
dlist_init(&leaf->segments);
if (GinPageIsCompressed(page))
*************** leafRepackItems(disassembledLeaf *leaf,
*** 1591,1597 ****
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > GinDataPageMaxDataSize)
{
if (!needsplit)
{
--- 1633,1639 ----
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > leaf->maxdatasize)
{
if (!needsplit)
{
*************** createPostingTree(Relation index, ItemPo
*** 1683,1688 ****
--- 1725,1731 ----
Pointer ptr;
int nrootitems;
int rootsize;
+ int maxdatasize;
/* Construct the new root page in memory first. */
tmppage = (Page) palloc(BLCKSZ);
*************** createPostingTree(Relation index, ItemPo
*** 1695,1700 ****
--- 1738,1744 ----
*/
nrootitems = 0;
rootsize = 0;
+ maxdatasize = GinGetMaxDataSize(index, buildStats ? true: false);
ptr = (Pointer) GinDataLeafPageGetPostingList(tmppage);
while (nrootitems < nitems)
{
*************** createPostingTree(Relation index, ItemPo
*** 1707,1713 ****
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > GinDataPageMaxDataSize)
break;
memcpy(ptr, segment, segsize);
--- 1751,1757 ----
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > maxdatasize)
break;
memcpy(ptr, segment, segsize);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
new file mode 100644
index 445466b..1f7835e
*** a/src/backend/access/gin/ginutil.c
--- b/src/backend/access/gin/ginutil.c
*************** ginoptions(PG_FUNCTION_ARGS)
*** 527,533 ****
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
--- 527,534 ----
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)},
! {"fillfactor", RELOPT_TYPE_INT, offsetof(GinOptions, fillfactor)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
new file mode 100644
index cf2ef80..8937343
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
*************** typedef struct GinOptions
*** 323,328 ****
--- 323,329 ----
int32 vl_len_; /* varlena header (do not touch directly!) */
bool useFastUpdate; /* use fast updates? */
int pendingListCleanupSize; /* maximum size of pending list */
+ int fillfactor; /* page fillfactor in percent (20..100) */
} GinOptions;
#define GIN_DEFAULT_USE_FASTUPDATE true
*************** typedef struct GinOptions
*** 335,340 ****
--- 336,343 ----
((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
gin_pending_list_limit)
+ #define GIN_MIN_FILLFACTOR 20
+ #define GIN_DEFAULT_FILLFACTOR 90
/* Macros for buffer lock/unlock operations */
#define GIN_UNLOCK BUFFER_LOCK_UNLOCK
*************** extern BlockNumber createPostingTree(Rel
*** 752,757 ****
--- 755,761 ----
GinStatsData *buildStats);
extern void GinDataPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset);
extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
+ extern int GinGetMaxDataSize(Relation index, bool isBuild);
extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats);
On Sat, Jan 17, 2015 at 4:49 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
I already wrote quite detailed explanation of subject. Let mel try to
explain in shortly. GIN is two level nested btree. Thus, GIN would have
absolutely same benefits from fillfactor as btree. Lack of tests showing it
is, for sure, fault.However, GIN posting trees are ordered by ItemPointer and this makes some
specific. If you have freshly created table and do inserts/updates they
would use the end of heap. Thus, inserts would go to the end of GIN posting
tree and fillfactor wouldn't affect anything. Fillfactor would give benefits
on HOT or heap space re-usage.
Ah, OK. Those tests clarify things considerably; I see the point now.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Le lundi 19 janvier 2015 08:24:08 Robert Haas a écrit :
On Sat, Jan 17, 2015 at 4:49 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
I already wrote quite detailed explanation of subject. Let mel try
to
explain in shortly. GIN is two level nested btree. Thus, GIN would
have absolutely same benefits from fillfactor as btree. Lack of
tests showing it is, for sure, fault.However, GIN posting trees are ordered by ItemPointer and this makes
some specific. If you have freshly created table and do
inserts/updates they would use the end of heap. Thus, inserts would
go to the end of GIN posting tree and fillfactor wouldn't affect
anything. Fillfactor would give benefits on HOT or heap space
re-usage.Ah, OK. Those tests clarify things considerably; I see the point now.
So I do.
Alexander said:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.
2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.
The patch is 2), for the posting tree only ?
Thanks!
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Mon, Jan 19, 2015 at 5:46 PM, Cédric Villemain <cedric@2ndquadrant.com>
wrote:
Le lundi 19 janvier 2015 08:24:08 Robert Haas a écrit :
On Sat, Jan 17, 2015 at 4:49 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
I already wrote quite detailed explanation of subject. Let mel try
to
explain in shortly. GIN is two level nested btree. Thus, GIN would
have absolutely same benefits from fillfactor as btree. Lack of
tests showing it is, for sure, fault.
However, GIN posting trees are ordered by ItemPointer and this makes
some specific. If you have freshly created table and do
inserts/updates they would use the end of heap. Thus, inserts would
go to the end of GIN posting tree and fillfactor wouldn't affect
anything. Fillfactor would give benefits on HOT or heap space
re-usage.
Ah, OK. Those tests clarify things considerably; I see the point now.
So I do.
Alexander said:
1) In order to have fully correct support of fillfactor in GIN we need to
rewrite GIN build algorithm.2) Without rewriting GIN build algorithm, not much can be done with entry
tree. However, you can implement some heuristics.The patch is 2), for the posting tree only ?
Yes, it's just for posting tree.
------
With best regards,
Alexander Korotkov.
Le samedi 17 janvier 2015 08:23:44 Michael Paquier a écrit :
On Sat, Jan 17, 2015 at 2:40 AM, Robert Haas
<robertmhaas@gmail.com> wrote:
There's only value in adding a fillfactor parameter to GIN indexes
if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?Indeed. There is no such benchmark as of now, and I am not sure I'll
get the time to work more on that soon, so let's reject the patch for
now. And sorry for the useless noise.
Michael, I first didn't understood how GiN can benefits from the
patch...thus my question.
There were no noise for me, and I learn some more about GiN. So I thank
you for your work!
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Tue, Jan 20, 2015 at 12:06 AM, Cédric Villemain
<cedric@2ndquadrant.com> wrote:
Michael, I first didn't understood how GiN can benefits from the
patch...thus my question.There were no noise for me, and I learn some more about GiN. So I thank you
for your work!
Kicking back the ball, I haven't done as much work as I should have.
Alexander, I think that we should continue on this patch in the next
CF. Are you fine if an entry is added with you as author. I'll drop
myself as author (not that I cannot find the time, just that I am sort
of useless).
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I've been wondering whether this might improve behavior with one of my
workloads, suffering by GIN bloat - the same one I used to test GIN
fastscan, for example.
It's a batch process that loads a mailing list archive into a table with
a GIN index on message body, by doing something like this:
for file in files:
BEGIN;
for message in file:
SAVEPOINT s;
INSERT INTO messages VALUES (...)
if error:
ROLLBACK TO s;
COMMIT;
And there are multiple processes, each processing subset of mbox files.
There are ~1M messages and right after the load I see this:
List of relations
Schema | Name | Type | Owner | Table | Size
--------+------------------+-------+-------+----------+---------
public | message_body_idx | index | tomas | messages | 2247 MB
(1 row)
and after VACUUM FULL:
List of relations
Schema | Name | Type | Owner | Table | Size
--------+------------------+-------+-------+----------+---------
public | message_body_idx | index | tomas | messages | 403 MB
(1 row)
So the index is ~5x larger, which is probably expected due to the amount
of random inserts within a very short time (~15 minutes), executed in
parallel.
I hoped lowering the fillfactor will improve this, but fillfactor=75 had
pretty much no effect in this case. Is that expected for this kind of
workload? I see the previous discussion talked about random updates, not
inserts, so maybe that's the culprit?
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 24, 2015 at 5:15 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
I hoped lowering the fillfactor will improve this, but fillfactor=75 had
pretty much no effect in this case. Is that expected for this kind of
workload? I see the previous discussion talked about random updates, not
inserts, so maybe that's the culprit?
Yes. Since posting trees are ordered by item pointers, you can get benefit
of fillfactor only if you use some item pointers lower than item pointers
already in use. You can still get benefit in the insert case but you should
have already some free space in the heap (perhaps do some deletes and
vacuum).
Actually, this is narrowing benefit from GIN fillfactor. Probably, that
means that we should still have default value of 100. But I think GIN
fillfactor still might be useful.
------
With best regards,
Alexander Korotkov.
On 25.2.2015 10:20, Alexander Korotkov wrote:
On Tue, Feb 24, 2015 at 5:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:I hoped lowering the fillfactor will improve this, but
fillfactor=75 had pretty much no effect in this case. Is that
expected for this kind of workload? I see the previous discussion
talked about random updates, not inserts, so maybe that's the
culprit?Yes. Since posting trees are ordered by item pointers, you can get
benefit of fillfactor only if you use some item pointers lower than
item pointers already in use. You can still get benefit in the insert
case but you should have already some free space in the heap (perhaps
do some deletes and vacuum).
OK, understood. Thanks for the explanation.
Actually, this is narrowing benefit from GIN fillfactor. Probably,
that means that we should still have default value of 100. But I
think GIN fillfactor still might be useful.
I'm not sure about that. It's true that lowering fillfactor to 90 does
not help this particular workload, but it does not hurt it either. For
other common workloads (updating the values, not just inserting them),
lowering the fillfactor to 90 may easily be a big improvement.
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
In dataPlaceToPageLeaf-function:
if (append)
{
/*
* Even when appending, trying to append more items than will fit is
* not completely free, because we will merge the new items and old
* items into an array below. In the best case, every new item fits in
* a single byte, and we can use all the free space on the old page as
* well as the new page. For simplicity, ignore segment overhead etc.
*/
maxitems = Min(maxitems, freespace + GinDataPageMaxDataSize);
}
Hmm. So after splitting the page, there is freespace +
GinDataPageMaxDataSize bytes available on both pages together. freespace
has been adjusted with the fillfactor, but GinDataPageMaxDataSize is
not. So this overshoots, because when leafRepackItems actually
distributes the segments on the pages, it will fill both pages only up
to the fillfactor. This is an upper bound, so that's harmless, it only
leads to some unnecessary work in dealing with the item lists. But I
think that should be:
maxitems = Min(maxitems, freespace + leaf->maxdatasize);
else
{
/*
* Calculate a conservative estimate of how many new items we can fit
* on the two pages after splitting.
*
* We can use any remaining free space on the old page to store full
* segments, as well as the new page. Each full-sized segment can hold
* at least MinTuplesPerSegment items
*/
int nnewsegments;nnewsegments = freespace / GinPostingListSegmentMaxSize;
nnewsegments += GinDataPageMaxDataSize / GinPostingListSegmentMaxSize;
maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment);
}
This branch makes the same mistake, but this is calculating a lower
bound. It's important that maxitems is not set to higher value than what
actually fits on the page, otherwise you can get an ERROR later in the
function, when it turns out that not all the items actually fit on the
page. The saving grace here is that this branch is never taken when
building a new index, because index build inserts all the TIDs in order,
but that seems pretty fragile. Should use leaf->maxdatasize instead of
GinDataPageMaxDataSize here too.
But that can lead to funny things, if fillfactor is small, and BLCKSZ is
small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above
formula will set nnewsegments to zero. That's not going to end up well.
The problem is that maxdatasize becomes smaller than
GinPostingListSegmentMaxSize, which is a problem. I think
GinGetMaxDataSize() needs to make sure that the returned value is always
= GinPostingListSegmentMaxSize.
Now that we have a fillfactor, shouldn't we replace the 75% heuristic
later in that function, when inserting to the rightmost page rather than
building it from scratch? In B-tree, the fillfactor is applied to that
case too.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi!
On Wed, Jul 8, 2015 at 10:27 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
In dataPlaceToPageLeaf-function:
if (append)
{
/*
* Even when appending, trying to append more items than
will fit is
* not completely free, because we will merge the new
items and old
* items into an array below. In the best case, every new
item fits in
* a single byte, and we can use all the free space on
the old page as
* well as the new page. For simplicity, ignore segment
overhead etc.
*/
maxitems = Min(maxitems, freespace +
GinDataPageMaxDataSize);
}Hmm. So after splitting the page, there is freespace +
GinDataPageMaxDataSize bytes available on both pages together. freespace
has been adjusted with the fillfactor, but GinDataPageMaxDataSize is not.
So this overshoots, because when leafRepackItems actually distributes the
segments on the pages, it will fill both pages only up to the fillfactor.
This is an upper bound, so that's harmless, it only leads to some
unnecessary work in dealing with the item lists. But I think that should be:maxitems = Min(maxitems, freespace + leaf->maxdatasize);
Good catch! Fixed.
else
{
/*
* Calculate a conservative estimate of how many new
items we can fit
* on the two pages after splitting.
*
* We can use any remaining free space on the old page to
store full
* segments, as well as the new page. Each full-sized
segment can hold
* at least MinTuplesPerSegment items
*/
int nnewsegments;nnewsegments = freespace / GinPostingListSegmentMaxSize;
nnewsegments += GinDataPageMaxDataSize /
GinPostingListSegmentMaxSize;
maxitems = Min(maxitems, nnewsegments *
MinTuplesPerSegment);
}This branch makes the same mistake, but this is calculating a lower bound.
It's important that maxitems is not set to higher value than what actually
fits on the page, otherwise you can get an ERROR later in the function,
when it turns out that not all the items actually fit on the page. The
saving grace here is that this branch is never taken when building a new
index, because index build inserts all the TIDs in order, but that seems
pretty fragile. Should use leaf->maxdatasize instead of
GinDataPageMaxDataSize here too.
Fixed.
But that can lead to funny things, if fillfactor is small, and BLCKSZ is
small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above
formula will set nnewsegments to zero. That's not going to end up well. The
problem is that maxdatasize becomes smaller than
GinPostingListSegmentMaxSize, which is a problem. I think
GinGetMaxDataSize() needs to make sure that the returned value is always >=
GinPostingListSegmentMaxSize.
Fixed.
Now that we have a fillfactor, shouldn't we replace the 75% heuristic
later in that function, when inserting to the rightmost page rather than
building it from scratch? In B-tree, the fillfactor is applied to that case
too.
Sounds reasonable. Now it works like btree.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
gin_fillfactor_6.patchapplication/octet-stream; name=gin_fillfactor_6.patchDownload
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index ce36a1b..9a7b5af
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 294,301 ****
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GiST and SP-GiST index methods all
! accept this parameter:
</para>
<variablelist>
--- 294,301 ----
<para>
The optional <literal>WITH</> clause specifies <firstterm>storage
parameters</> for the index. Each index method has its own set of allowed
! storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods
! all accept this parameter:
</para>
<variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
new file mode 100644
index 8176b6a..443b125
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 15,20 ****
--- 15,21 ----
#include "postgres.h"
+ #include "access/gin_private.h"
#include "access/gist_private.h"
#include "access/hash.h"
#include "access/htup_details.h"
*************** static relopt_int intRelOpts[] =
*** 133,138 ****
--- 134,147 ----
},
{
{
+ "fillfactor",
+ "Packs gin index pages only to this percentage",
+ RELOPT_KIND_GIN
+ },
+ GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+ },
+ {
+ {
"autovacuum_vacuum_threshold",
"Minimum number of tuple updates or deletes prior to vacuum",
RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
new file mode 100644
index ec8c94b..35c3761
*** a/src/backend/access/gin/gindatapage.c
--- b/src/backend/access/gin/gindatapage.c
*************** typedef struct
*** 55,60 ****
--- 55,62 ----
dlist_node *lastleft; /* last segment on left page */
int lsize; /* total size on left page */
int rsize; /* total size on right page */
+ int maxdatasize; /* max data size per page
+ according to fillfactor */
bool oldformat; /* page is in pre-9.4 format on disk */
} disassembledLeaf;
*************** GinPageDeletePostingItem(Page page, Offs
*** 423,428 ****
--- 425,461 ----
}
/*
+ * Returns max data size on the page according to index fillfactor.
+ */
+ int
+ GinGetMaxDataSize(Relation index, bool isBuild)
+ {
+ int fillfactor, result;
+
+ /* Grab option value which affects only index build */
+ if (!isBuild)
+ {
+ fillfactor = 100;
+ }
+ else if (index->rd_options)
+ {
+ GinOptions *options = (GinOptions *) index->rd_options;
+ fillfactor = options->fillfactor;
+ }
+ else
+ {
+ fillfactor = GIN_DEFAULT_FILLFACTOR;
+ }
+
+ /* Caclculate max data size on page according to fillfactor */
+ result = GinDataPageMaxDataSize - BLCKSZ * (100 - fillfactor) / 100;
+ /* Ensure that at least one posting list segment fits the page */
+ result = Max(result, GinPostingListSegmentMaxSize);
+
+ return result;
+ }
+
+ /*
* Places keys to leaf data page and fills WAL record.
*/
static GinPlaceToPageRC
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 486,491 ****
--- 519,526 ----
leaf = disassembleLeaf(page);
+ leaf->maxdatasize = GinGetMaxDataSize(btree->index, btree->isBuild);
+
/*
* Are we appending to the end of the page? IOW, are all the new items
* larger than any of the existing items.
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 511,525 ****
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit (after splitting), and stop when the pages becomes full.
! * Otherwise we have to limit the number of new items to insert, because
! * once we start packing we can't just stop when we run out of space,
! * because we must make sure that all the old items still fit.
*/
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
else
freespace = 0;
if (append)
{
/*
--- 546,571 ----
/*
* If we're appending to the end of the page, we will append as many items
! * as we can fit up to the given fillfactor at build (after splitting),
! * and stop when the pages becomes full at this rate. Otherwise we have to
! * limit the number of new items to insert, because once we start packing
! * we can't just stop when we run out of space, because we must make sure
! * that all the old items still fit.
*/
if (GinPageIsCompressed(page))
+ {
freespace = GinDataLeafPageGetFreeSpace(page);
+ if (btree->isBuild)
+ {
+ freespace -= GinDataPageMaxDataSize - leaf->maxdatasize;
+ freespace = Max(0, freespace);
+ }
+ }
else
+ {
freespace = 0;
+ }
+
if (append)
{
/*
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 529,535 ****
* a single byte, and we can use all the free space on the old page as
* well as the new page. For simplicity, ignore segment overhead etc.
*/
! maxitems = Min(maxitems, freespace + GinDataPageMaxDataSize);
}
else
{
--- 575,581 ----
* a single byte, and we can use all the free space on the old page as
* well as the new page. For simplicity, ignore segment overhead etc.
*/
! maxitems = Min(maxitems, freespace + leaf->maxdatasize);
}
else
{
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 544,550 ****
int nnewsegments;
nnewsegments = freespace / GinPostingListSegmentMaxSize;
! nnewsegments += GinDataPageMaxDataSize / GinPostingListSegmentMaxSize;
maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment);
}
--- 590,596 ----
int nnewsegments;
nnewsegments = freespace / GinPostingListSegmentMaxSize;
! nnewsegments += leaf->maxdatasize / GinPostingListSegmentMaxSize;
maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment);
}
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 631,643 ****
* until they're balanced.
*
* As a further heuristic, when appending items to the end of the
! * page, try make the left page 75% full, one the assumption that
! * subsequent insertions will probably also go to the end. This packs
! * the index somewhat tighter when appending to a table, which is very
! * common.
*/
if (!btree->isBuild)
{
while (dlist_has_prev(&leaf->segments, leaf->lastleft))
{
lastleftinfo = dlist_container(leafSegmentInfo, node, leaf->lastleft);
--- 677,692 ----
* until they're balanced.
*
* As a further heuristic, when appending items to the end of the
! * page, try make the left page fillfactor% full, on the assumption
! * that subsequent insertions will probably also go to the end. This
! * packs the posting tree as tight as during index build when appending
! * to a table, which is very common.
*/
if (!btree->isBuild)
{
+ /* Calculate max data size like it's an index build */
+ int lmaxsize = GinGetMaxDataSize(btree->index, true);
+
while (dlist_has_prev(&leaf->segments, leaf->lastleft))
{
lastleftinfo = dlist_container(leafSegmentInfo, node, leaf->lastleft);
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 657,663 ****
break;
if (append)
{
! if ((leaf->lsize - segsize) < (BLCKSZ * 3) / 4)
break;
}
--- 706,712 ----
break;
if (append)
{
! if ((leaf->lsize - segsize) < lmaxsize)
break;
}
*************** dataPlaceToPageLeaf(GinBtree btree, Buff
*** 667,672 ****
--- 716,722 ----
leaf->lastleft = dlist_prev_node(&leaf->segments, leaf->lastleft);
}
}
+
Assert(leaf->lsize <= GinDataPageMaxDataSize);
Assert(leaf->rsize <= GinDataPageMaxDataSize);
*************** disassembleLeaf(Page page)
*** 1284,1289 ****
--- 1334,1340 ----
Pointer segend;
leaf = palloc0(sizeof(disassembledLeaf));
+ leaf->maxdatasize = GinDataPageMaxDataSize;
dlist_init(&leaf->segments);
if (GinPageIsCompressed(page))
*************** leafRepackItems(disassembledLeaf *leaf,
*** 1595,1601 ****
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > GinDataPageMaxDataSize)
{
if (!needsplit)
{
--- 1646,1652 ----
* copying to the page. Did we exceed the size that fits on one page?
*/
segsize = SizeOfGinPostingList(seginfo->seg);
! if (pgused + segsize > leaf->maxdatasize)
{
if (!needsplit)
{
*************** createPostingTree(Relation index, ItemPo
*** 1687,1692 ****
--- 1738,1744 ----
Pointer ptr;
int nrootitems;
int rootsize;
+ int maxdatasize;
/* Construct the new root page in memory first. */
tmppage = (Page) palloc(BLCKSZ);
*************** createPostingTree(Relation index, ItemPo
*** 1699,1704 ****
--- 1751,1757 ----
*/
nrootitems = 0;
rootsize = 0;
+ maxdatasize = GinGetMaxDataSize(index, buildStats ? true: false);
ptr = (Pointer) GinDataLeafPageGetPostingList(tmppage);
while (nrootitems < nitems)
{
*************** createPostingTree(Relation index, ItemPo
*** 1711,1717 ****
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > GinDataPageMaxDataSize)
break;
memcpy(ptr, segment, segsize);
--- 1764,1770 ----
GinPostingListSegmentMaxSize,
&npacked);
segsize = SizeOfGinPostingList(segment);
! if (rootsize + segsize > maxdatasize)
break;
memcpy(ptr, segment, segsize);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
new file mode 100644
index cb4e32f..7403366
*** a/src/backend/access/gin/ginutil.c
--- b/src/backend/access/gin/ginutil.c
*************** ginoptions(PG_FUNCTION_ARGS)
*** 527,533 ****
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
--- 527,534 ----
static const relopt_parse_elt tab[] = {
{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
{"gin_pending_list_limit", RELOPT_TYPE_INT, offsetof(GinOptions,
! pendingListCleanupSize)},
! {"fillfactor", RELOPT_TYPE_INT, offsetof(GinOptions, fillfactor)}
};
options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
new file mode 100644
index 5f214d7..586389d
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
*************** typedef struct GinOptions
*** 323,328 ****
--- 323,329 ----
int32 vl_len_; /* varlena header (do not touch directly!) */
bool useFastUpdate; /* use fast updates? */
int pendingListCleanupSize; /* maximum size of pending list */
+ int fillfactor; /* page fillfactor in percent (20..100) */
} GinOptions;
#define GIN_DEFAULT_USE_FASTUPDATE true
*************** typedef struct GinOptions
*** 335,340 ****
--- 336,343 ----
((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
gin_pending_list_limit)
+ #define GIN_MIN_FILLFACTOR 20
+ #define GIN_DEFAULT_FILLFACTOR 90
/* Macros for buffer lock/unlock operations */
#define GIN_UNLOCK BUFFER_LOCK_UNLOCK
*************** extern BlockNumber createPostingTree(Rel
*** 724,729 ****
--- 727,733 ----
GinStatsData *buildStats);
extern void GinDataPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset);
extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
+ extern int GinGetMaxDataSize(Relation index, bool isBuild);
extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats);
On Thu, Jul 9, 2015 at 10:33 PM, Alexander Korotkov wrote:
[...]
+ /* Caclculate max data size on page according to fillfactor */
s/Caclculate/Calculate
When creating a simple gin index, I am seeing that GinGetMaxDataSize gets
called with ginEntryInsert:
* frame #0: 0x000000010a49d72e
postgres`GinGetMaxDataSize(index=0x0000000114168378, isBuild='\x01') + 14
at gindatapage.c:436
frame #1: 0x000000010a49ecbe
postgres`createPostingTree(index=0x0000000114168378,
items=0x0000000114a9c038, nitems=35772, buildStats=0x00007fff55787350) +
302 at gindatapage.c:1754
frame #2: 0x000000010a493423
postgres`buildFreshLeafTuple(ginstate=0x00007fff55784d90, attnum=1,
key=2105441, category='\0', items=0x0000000114a9c038, nitem=35772,
buildStats=0x00007fff55787350) + 339 at gininsert.c:158
frame #3: 0x000000010a492f84
postgres`ginEntryInsert(ginstate=0x00007fff55784d90, attnum=1, key=2105441,
category='\0', items=0x0000000114a9c038, nitem=35772,
buildStats=0x00007fff55787350) + 916 at gininsert.c:228
And as far as I can see GinGetMaxDataSize uses isBuild:
+ int
+ GinGetMaxDataSize(Relation index, bool isBuild)
+ {
+ int fillfactor, result;
+
+ /* Grab option value which affects only index build */
+ if (!isBuild)
However isBuild is not set in this code path, see
http://www.postgresql.org/message-id/CAB7nPqSc4VQ9mHKqm_YvAfcTEhO-iUY8SKbXYdnMGnAi1XnPaw@mail.gmail.com
where I reported the problem. So this code is somewhat broken, no? I am
attaching to this email the patch in question that should be applied on top
of Alexander's latest version.
+ #define GIN_MIN_FILLFACTOR 20
+ #define GIN_DEFAULT_FILLFACTOR 90
I am still worrying about using a default fillfactor at 90, as we did a lot
of promotion about compression of GIN indexes in 9.4. Feel free to ignore
this comment if you think that 90 is a good default. The difference is
visibly in order of MBs even for large indexes still, this is changing the
default of 9.4 and 9.5.
Regards,
--
Michael
Attachments:
20141120_gin_entry_build_fix.patchtext/x-diff; charset=US-ASCII; name=20141120_gin_entry_build_fix.patchDownload
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 370884e..229167b 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -191,6 +191,7 @@ ginEntryInsert(GinState *ginstate,
buildStats->nEntries++;
ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
+ btree.isBuild = (buildStats != NULL);
stack = ginFindLeafPage(&btree, false);
page = BufferGetPage(stack->buffer);
On Fri, Jul 10, 2015 at 4:54 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Thu, Jul 9, 2015 at 10:33 PM, Alexander Korotkov wrote:
[...]
+ /* Caclculate max data size on page according to fillfactor */
s/Caclculate/CalculateWhen creating a simple gin index, I am seeing that GinGetMaxDataSize gets
called with ginEntryInsert:
* frame #0: 0x000000010a49d72e
postgres`GinGetMaxDataSize(index=0x0000000114168378, isBuild='\x01') + 14
at gindatapage.c:436
frame #1: 0x000000010a49ecbe
postgres`createPostingTree(index=0x0000000114168378,
items=0x0000000114a9c038, nitems=35772, buildStats=0x00007fff55787350) +
302 at gindatapage.c:1754
frame #2: 0x000000010a493423
postgres`buildFreshLeafTuple(ginstate=0x00007fff55784d90, attnum=1,
key=2105441, category='\0', items=0x0000000114a9c038, nitem=35772,
buildStats=0x00007fff55787350) + 339 at gininsert.c:158
frame #3: 0x000000010a492f84
postgres`ginEntryInsert(ginstate=0x00007fff55784d90, attnum=1, key=2105441,
category='\0', items=0x0000000114a9c038, nitem=35772,
buildStats=0x00007fff55787350) + 916 at gininsert.c:228And as far as I can see GinGetMaxDataSize uses isBuild: + int + GinGetMaxDataSize(Relation index, bool isBuild) + { + int fillfactor, result; + + /* Grab option value which affects only index build */ + if (!isBuild) However isBuild is not set in this code path, see http://www.postgresql.org/message-id/CAB7nPqSc4VQ9mHKqm_YvAfcTEhO-iUY8SKbXYdnMGnAi1XnPaw@mail.gmail.com where I reported the problem. So this code is somewhat broken, no? I am attaching to this email the patch in question that should be applied on top of Alexander's latest version.
I didn't get what is problem. Was this stacktrace during index build? If
so, GinGetMaxDataSize really should get isBuild='\x01'. It is set by
following line
maxdatasize = GinGetMaxDataSize(index, buildStats ? true: false);
+ #define GIN_MIN_FILLFACTOR 20 + #define GIN_DEFAULT_FILLFACTOR 90 I am still worrying about using a default fillfactor at 90, as we did a lot of promotion about compression of GIN indexes in 9.4. Feel free to ignore this comment if you think that 90 is a good default. The difference is visibly in order of MBs even for large indexes still, this is changing the default of 9.4 and 9.5.
On the other side, it's unclear why should we have fillfactor distinct from
btree..
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 07/10/2015 01:13 PM, Alexander Korotkov wrote:
On Fri, Jul 10, 2015 at 4:54 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:+ #define GIN_MIN_FILLFACTOR 20 + #define GIN_DEFAULT_FILLFACTOR 90 I am still worrying about using a default fillfactor at 90, as we did a lot of promotion about compression of GIN indexes in 9.4. Feel free to ignore this comment if you think that 90 is a good default. The difference is visibly in order of MBs even for large indexes still, this is changing the default of 9.4 and 9.5.On the other side, it's unclear why should we have fillfactor distinct from
btree..
Good question. One argument is because the items on a GIN posting page
are much smaller (2-3 bytes typically) than index tuples in a B-tree
page (32 bytes or more). By a rough calculation, in the 10% free space
you leave in an index page, which is about 800 bytes, you can insert at
most 25 or so tuples. In the same amount of free space in a GIN page,
you can fit hundreds items. The fillfactor is particularly useful to
avoid splitting a lot pages after creating a new index, when you do a
few random updates.
Another argument is that for the typical use cases of GIN, tuples are
not updated as often as with B-tree indexes.
A third argument is that before this patch, we always packed the index
as full as possible (i.e. fillfactor=100), and we want to avoid
"changing the default" so much.
I'm not sure any of those arguments are very strong, but my gut feeling
is that 90 might indeed be too small. Perhaps set the default to 95..
I'm curious why the minimum in the patch is 20, while the minimum for
b-tree is 10, though. I don't see any particularly good reason for that.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 10, 2015 at 7:13 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Fri, Jul 10, 2015 at 4:54 AM, Michael Paquier <
michael.paquier@gmail.com> wrote:On Thu, Jul 9, 2015 at 10:33 PM, Alexander Korotkov wrote:
[...]
+ /* Caclculate max data size on page according to fillfactor */
s/Caclculate/CalculateWhen creating a simple gin index, I am seeing that GinGetMaxDataSize gets
called with ginEntryInsert:
* frame #0: 0x000000010a49d72e
postgres`GinGetMaxDataSize(index=0x0000000114168378, isBuild='\x01') + 14
at gindatapage.c:436
frame #1: 0x000000010a49ecbe
postgres`createPostingTree(index=0x0000000114168378,
items=0x0000000114a9c038, nitems=35772, buildStats=0x00007fff55787350) +
302 at gindatapage.c:1754
frame #2: 0x000000010a493423
postgres`buildFreshLeafTuple(ginstate=0x00007fff55784d90, attnum=1,
key=2105441, category='\0', items=0x0000000114a9c038, nitem=35772,
buildStats=0x00007fff55787350) + 339 at gininsert.c:158
frame #3: 0x000000010a492f84
postgres`ginEntryInsert(ginstate=0x00007fff55784d90, attnum=1, key=2105441,
category='\0', items=0x0000000114a9c038, nitem=35772,
buildStats=0x00007fff55787350) + 916 at gininsert.c:228And as far as I can see GinGetMaxDataSize uses isBuild: + int + GinGetMaxDataSize(Relation index, bool isBuild) + { + int fillfactor, result; + + /* Grab option value which affects only index build */ + if (!isBuild) However isBuild is not set in this code path, see http://www.postgresql.org/message-id/CAB7nPqSc4VQ9mHKqm_YvAfcTEhO-iUY8SKbXYdnMGnAi1XnPaw@mail.gmail.com where I reported the problem. So this code is somewhat broken, no? I am attaching to this email the patch in question that should be applied on top of Alexander's latest version.I didn't get what is problem. Was this stacktrace during index build? If
so, GinGetMaxDataSize really should get isBuild='\x01'. It is set by
following linemaxdatasize = GinGetMaxDataSize(index, buildStats ? true: false);
Yeah, I just find confusing that we actually rely on the fact that
buildStats is NULL or not here instead of passing directly the isBuild flag
of GinBtree for example. But that's a point of detail..
--
Michael
Has anyone done any performance testing of this?
The purpose of a fillfactor is to avoid the storm of page splits right
after building the index, when there are some random updates to the
table. It causes the index to bloat, as every full page is split to two
half-full pages, and also adds latency to all the updates that have to
split pages.
As the code stands, do we actually have such a problem with GIN? Let's
investigate. Let's create a little test table:
create table foo (id int4, t text);
insert into foo select g, 'foo' from generate_series(1, 1000000) g;
And some indexes on it:
-- B-tree index on id, 100%
create index btree_100 on foo (id) with (fillfactor=100);
-- B-tree index on id, 90% (which is the default)
create index btree_90 on foo (id) with (fillfactor=90);
-- GIN index on id. Id is different on each row, so this index has no
posting trees, just the entry tree.
create index gin_id on foo using gin (id) with (fastupdate=off);
-- GIN index on t. T is the same on each row, so this index consists of
a single posting tree.
create index gin_t on foo using gin (t) with (fastupdate=off);
Immediately after creation, the index sizes are:
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 19 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1072 kB |
(4 rows)
Now let's update 1% of the table, spread evenly across the table, and
see what happens to the index sizes:
postgres=# update foo set id = id where id % 100 = 0;
UPDATE 10000
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 39 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1080 kB |
(4 rows)
As you can see, btree_100 index doubled in size. That's the effect we're
trying to avoid with the fillfactor, and indeed in the btree_90 index it
was avoided. However, the GIN indexes are not bloated either. Why is that?
The entry tree (demonstrated by the gin_id index) is not packed tightly
when it's built. If anything, we could pack it more tightly to make it
smaller to begin with. For the use cases where GIN works best, though,
the entry tree should be fairly small compared to the posting lists, so
it shouldn't make a big difference. For the posting tree, I think that's
because the way the items are packed in the segments. Even though we
pack the segments as tightly as possible, there is always some free
space left over on a page that's not enough to add another segment to it
at index build, but can be used by the updates to add items to the
existing segments. So in practice you always end up with some free space
on the posting tree pages, even without a fillfactor, so the "effective
fillfactor" even today is not actually 100%.
Based on this, I think we should just drop this patch. It's not useful
in practice.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 21, 2015 at 12:40 PM, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:
Has anyone done any performance testing of this?
The purpose of a fillfactor is to avoid the storm of page splits right
after building the index, when there are some random updates to the table.
It causes the index to bloat, as every full page is split to two half-full
pages, and also adds latency to all the updates that have to split pages.As the code stands, do we actually have such a problem with GIN? Let's
investigate. Let's create a little test table:create table foo (id int4, t text);
insert into foo select g, 'foo' from generate_series(1, 1000000) g;And some indexes on it:
-- B-tree index on id, 100%
create index btree_100 on foo (id) with (fillfactor=100);
-- B-tree index on id, 90% (which is the default)
create index btree_90 on foo (id) with (fillfactor=90);
-- GIN index on id. Id is different on each row, so this index has no
posting trees, just the entry tree.
create index gin_id on foo using gin (id) with (fastupdate=off);
-- GIN index on t. T is the same on each row, so this index consists of a
single posting tree.
create index gin_t on foo using gin (t) with (fastupdate=off);Immediately after creation, the index sizes are:
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 19 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1072 kB |
(4 rows)Now let's update 1% of the table, spread evenly across the table, and see
what happens to the index sizes:postgres=# update foo set id = id where id % 100 = 0;
UPDATE 10000
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 39 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1080 kB |
(4 rows)As you can see, btree_100 index doubled in size. That's the effect we're
trying to avoid with the fillfactor, and indeed in the btree_90 index it
was avoided. However, the GIN indexes are not bloated either. Why is that?The entry tree (demonstrated by the gin_id index) is not packed tightly
when it's built. If anything, we could pack it more tightly to make it
smaller to begin with. For the use cases where GIN works best, though, the
entry tree should be fairly small compared to the posting lists, so it
shouldn't make a big difference. For the posting tree, I think that's
because the way the items are packed in the segments. Even though we pack
the segments as tightly as possible, there is always some free space left
over on a page that's not enough to add another segment to it at index
build, but can be used by the updates to add items to the existing
segments. So in practice you always end up with some free space on the
posting tree pages, even without a fillfactor, so the "effective
fillfactor" even today is not actually 100%.
There are two reasons of this behavior. You mentioned the first one:
"effective fillfactor" for posting lists is not 100%. But the second one is
structure of posting lists. In your example UPDATE allocates new tuple at
the end of heap. So, from the point of posting lists these updates are
appends. Situation is different when you reuse space in the heap. See an
example.
drop table foo;
create table foo (id integer, val int[]) with (fillfactor = 90);
insert into foo (select g, array[g%2]::int[] from
generate_series(1,1000000) g);
create index foo_val_idx_100 on foo using gin(val) with (fastupdate=off,
fillfactor=100);
create index foo_val_idx_90 on foo using gin(val) with (fastupdate=off,
fillfactor=90);
# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------------+-------+--------+-------+---------+-------------
public | foo_val_idx_100 | index | smagen | foo | 1088 kB |
public | foo_val_idx_90 | index | smagen | foo | 1200 kB |
(2 rows)
update foo set val = array[(id+1)%2]::int[] where id % 50 = 0;
# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------------+-------+--------+-------+---------+-------------
public | foo_val_idx_100 | index | smagen | foo | 1608 kB |
public | foo_val_idx_90 | index | smagen | foo | 1200 kB |
(2 rows)
Based on this, I think we should just drop this patch. It's not useful in
practice.
I wouldn't say it's not useful at all. It's for sure not as useful as btree
fillfactor, but it still could be useful in some cases...
Probably, we could leave 100 as default fillfactor, but provide an option.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 07/21/2015 02:00 PM, Alexander Korotkov wrote:
On Tue, Jul 21, 2015 at 12:40 PM, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:Has anyone done any performance testing of this?
The purpose of a fillfactor is to avoid the storm of page splits right
after building the index, when there are some random updates to the table.
It causes the index to bloat, as every full page is split to two half-full
pages, and also adds latency to all the updates that have to split pages.As the code stands, do we actually have such a problem with GIN? Let's
investigate. Let's create a little test table:create table foo (id int4, t text);
insert into foo select g, 'foo' from generate_series(1, 1000000) g;And some indexes on it:
-- B-tree index on id, 100%
create index btree_100 on foo (id) with (fillfactor=100);
-- B-tree index on id, 90% (which is the default)
create index btree_90 on foo (id) with (fillfactor=90);
-- GIN index on id. Id is different on each row, so this index has no
posting trees, just the entry tree.
create index gin_id on foo using gin (id) with (fastupdate=off);
-- GIN index on t. T is the same on each row, so this index consists of a
single posting tree.
create index gin_t on foo using gin (t) with (fastupdate=off);Immediately after creation, the index sizes are:
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 19 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1072 kB |
(4 rows)Now let's update 1% of the table, spread evenly across the table, and see
what happens to the index sizes:postgres=# update foo set id = id where id % 100 = 0;
UPDATE 10000
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 39 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1080 kB |
(4 rows)As you can see, btree_100 index doubled in size. That's the effect we're
trying to avoid with the fillfactor, and indeed in the btree_90 index it
was avoided. However, the GIN indexes are not bloated either. Why is that?The entry tree (demonstrated by the gin_id index) is not packed tightly
when it's built. If anything, we could pack it more tightly to make it
smaller to begin with. For the use cases where GIN works best, though, the
entry tree should be fairly small compared to the posting lists, so it
shouldn't make a big difference. For the posting tree, I think that's
because the way the items are packed in the segments. Even though we pack
the segments as tightly as possible, there is always some free space left
over on a page that's not enough to add another segment to it at index
build, but can be used by the updates to add items to the existing
segments. So in practice you always end up with some free space on the
posting tree pages, even without a fillfactor, so the "effective
fillfactor" even today is not actually 100%.There are two reasons of this behavior. You mentioned the first one:
"effective fillfactor" for posting lists is not 100%. But the second one is
structure of posting lists. In your example UPDATE allocates new tuple at
the end of heap. So, from the point of posting lists these updates are
appends.
Ah, good point.
Situation is different when you reuse space in the heap. See an
example.drop table foo;
create table foo (id integer, val int[]) with (fillfactor = 90);
insert into foo (select g, array[g%2]::int[] from
generate_series(1,1000000) g);
create index foo_val_idx_100 on foo using gin(val) with (fastupdate=off,
fillfactor=100);
create index foo_val_idx_90 on foo using gin(val) with (fastupdate=off,
fillfactor=90);# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------------+-------+--------+-------+---------+-------------
public | foo_val_idx_100 | index | smagen | foo | 1088 kB |
public | foo_val_idx_90 | index | smagen | foo | 1200 kB |
(2 rows)update foo set val = array[(id+1)%2]::int[] where id % 50 = 0;
# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------------+-------+--------+-------+---------+-------------
public | foo_val_idx_100 | index | smagen | foo | 1608 kB |
public | foo_val_idx_90 | index | smagen | foo | 1200 kB |
(2 rows)
Hmm. That's slightly different from the test case I used, in that the
update is changing the indexed value, which means that the new value
goes to different posting tree than the old one. I'm not sure if that
makes any difference. You're also updating more rows, 1/50 vs. 1/100.
This case is closer to my earlier one:
postgres=# create table foo (id int4, i int4, t text) with (fillfactor=90);
CREATE TABLE
postgres=# insert into foo select g, 1 from generate_series(1, 1000000) g;
INSERT 0 1000000
postgres=# create index btree_foo_id on foo (id); -- to force non-HOT
updates
CREATE INDEX
postgres=# create index gin_i on foo using gin (i) with (fastupdate=off);
CREATE INDEX
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+--------+-------+---------+-------------
public | btree_foo_id | index | heikki | foo | 21 MB |
public | gin_i | index | heikki | foo | 1072 kB |
(2 rows)
postgres=# update foo set id=-id where id % 100 = 0;
UPDATE 10000
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+--------+-------+---------+-------------
public | btree_foo_id | index | heikki | foo | 22 MB |
public | gin_i | index | heikki | foo | 1080 kB |
(2 rows)
I'm not sure what's making the difference to your test case. Could be
simply that your case happens to leave less free space on each page,
because of the different data.
Based on this, I think we should just drop this patch. It's not useful in
practice.I wouldn't say it's not useful at all. It's for sure not as useful as btree
fillfactor, but it still could be useful in some cases...
Probably, we could leave 100 as default fillfactor, but provide an option.
Doesn't seem worth the trouble to me...
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 21, 2015 at 2:49 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Hmm. That's slightly different from the test case I used, in that the
update is changing the indexed value, which means that the new value goes
to different posting tree than the old one. I'm not sure if that makes any
difference. You're also updating more rows, 1/50 vs. 1/100.This case is closer to my earlier one:
postgres=# create table foo (id int4, i int4, t text) with (fillfactor=90);
CREATE TABLE
postgres=# insert into foo select g, 1 from generate_series(1, 1000000) g;
INSERT 0 1000000
postgres=# create index btree_foo_id on foo (id); -- to force non-HOT
updates
CREATE INDEX
postgres=# create index gin_i on foo using gin (i) with (fastupdate=off);
CREATE INDEX
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+--------+-------+---------+-------------
public | btree_foo_id | index | heikki | foo | 21 MB |
public | gin_i | index | heikki | foo | 1072 kB |
(2 rows)postgres=# update foo set id=-id where id % 100 = 0;
UPDATE 10000
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+--------+-------+---------+-------------
public | btree_foo_id | index | heikki | foo | 22 MB |
public | gin_i | index | heikki | foo | 1080 kB |
(2 rows)I'm not sure what's making the difference to your test case. Could be
simply that your case happens to leave less free space on each page,
because of the different data.
Yeah, it's likely because of different data.
Based on this, I think we should just drop this patch. It's not useful in
practice.
I wouldn't say it's not useful at all. It's for sure not as useful as
btree
fillfactor, but it still could be useful in some cases...
Probably, we could leave 100 as default fillfactor, but provide an option.Doesn't seem worth the trouble to me...
Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space. On the other had we're rejecting
fillfactor for GIN where it could have at least some use cases... Should we
don't have fillfactor for both GiST and GIN?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 07/21/2015 02:56 PM, Alexander Korotkov wrote:
Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space.
Why is it useless for GiST?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 21, 2015 at 3:52 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 07/21/2015 02:56 PM, Alexander Korotkov wrote:
Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space.Why is it useless for GiST?
It's because all of GiST pages are produced by page splits. So, just after
CREATE INDEX GiST pages aren't tightly packed in average. Actually, they
could be tightly packed by incredible coincidence, but for large indexes
it's quite safe assumption that they are not. With GiST we don't have storm
of page splits after index creation with fillfactor = 100. So, why should
we reserve additional space with fillfactor = 90?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 07/21/2015 04:14 PM, Alexander Korotkov wrote:
On Tue, Jul 21, 2015 at 3:52 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 07/21/2015 02:56 PM, Alexander Korotkov wrote:
Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space.Why is it useless for GiST?
It's because all of GiST pages are produced by page splits. So, just after
CREATE INDEX GiST pages aren't tightly packed in average. Actually, they
could be tightly packed by incredible coincidence, but for large indexes
it's quite safe assumption that they are not. With GiST we don't have storm
of page splits after index creation with fillfactor = 100. So, why should
we reserve additional space with fillfactor = 90?
Aha, I see. Yeah, that's pretty useless. Ideally, we would make the GiST
build algorithm smarter so that it would pack the pages more tightly. I
have no idea how to do that, however.
Anyway, the fact that fillfactor is useless for GiST is more of an
argument for removing it from GiST, than for adding it to GIN.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 21, 2015 at 7:20 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 07/21/2015 04:14 PM, Alexander Korotkov wrote:
On Tue, Jul 21, 2015 at 3:52 PM, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:On 07/21/2015 02:56 PM, Alexander Korotkov wrote:
Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space.Why is it useless for GiST?
It's because all of GiST pages are produced by page splits. So, just after
CREATE INDEX GiST pages aren't tightly packed in average. Actually, they
could be tightly packed by incredible coincidence, but for large indexes
it's quite safe assumption that they are not. With GiST we don't have
storm
of page splits after index creation with fillfactor = 100. So, why should
we reserve additional space with fillfactor = 90?Aha, I see. Yeah, that's pretty useless. Ideally, we would make the GiST
build algorithm smarter so that it would pack the pages more tightly. I
have no idea how to do that, however.Anyway, the fact that fillfactor is useless for GiST is more of an
argument for removing it from GiST, than for adding it to GIN.
Yes it is. I've just think that fillfactor is more useful for GIN than for
GiST now. However, it's probably doesn't worth it for both of them.
One of our customers complains that freshly built GIN indexes get bloat
very fast. I've asked them a test case. Using that test case I would try to
see if fillfactor for GIN could help in practice. For now we can mark it as
"Returned with feedback" where feedback is "Real life use cases needed".
Removing fillfactor for GiST is now not easy. Once it's exposed for users,
removing it completely would break compatibility. I would propose to chage
default fillfactor for GiST from 90 to 100.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company