Patch for fast gin cache performance improvement

Started by Ian Linkover 12 years ago13 messages
#1Ian Link
ian@ilink.io
2 attachment(s)

*

This patch contains a performance improvement for the fast gin cache. As
you may know, the performance of the fast gin cache decreases with its
size. Currently, the size of the fast gin cache is tied to work_mem. The
size of work_mem can often be quite high. The large size of work_mem is
inappropriate for the fast gin cache size. Therefore, we created a
separate cache size called gin_fast_limit. This global variable controls
the size of the fast gin cache, independently of work_mem. Currently,
the default gin_fast_limit is set to 128kB. However, that value could
need tweaking. 64kB may work better, but it's hard to say with only my
single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries
improve from about 0.9 ms to 0.030 ms. Please feel free to use the test
case yourself: it should be attached. I can look into additional test
cases (tsvectors) if anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it
is disabled. If the user does not specify a per-index limit, the index
will simply use the global limit.

I would like to thank Andrew Gierth for all his help on this patch. As
this is my first patch he was extremely helpful. The idea for this
performance improvement was entirely his. I just did the implementation.
Thanks for reading and considering this patch!*

Ian Link

Attachments:

gin-fast-perf.sqltext/plain; name=gin-fast-perf.sqlDownload
fastgin_mem.patchtext/plain; name=fastgin_mem.patchDownload
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 1162,1167 **** include 'filename'
--- 1162,1183 ----
        </listitem>
       </varlistentry>
  
+      <varlistentry id="guc-gin-fast-limit" xreflabel="gin_fast_limit">
+       <term><varname>gin_fast_limit</varname> (<type>integer</type>)</term>
+       <indexterm>
+        <primary><varname>gin_fast_limit</> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+        <para>
+         Specifies the amount of memory to be used by the fastgin cache before writing
+         to the GIN index itself. The value defaults to <literal>128kB</literal>.
+         The fastgin cache is more effective when it is small. The size of <varname>gin_fast_limit</varname>
+         should be kept significantly lower than <varname>work_mem</varname>. A value of 0 will turn off 
+         the cache entirely. This global can be overridden by the index setting <varname>fast_cache_size</varname>.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
       <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
        <term><varname>max_stack_depth</varname> (<type>integer</type>)</term>
        <indexterm>
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
***************
*** 355,360 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable class="parameter">name</
--- 355,373 ----
      </note>
      </listitem>
     </varlistentry>
+    <varlistentry>
+     <term><literal>FAST_CACHE_SIZE</></term>
+     <listitem>
+     <para>
+      This setting controls the size of the fast update cache described in
+      <xref linkend="gin-fast-update">. The parameter describes the size (in kB)
+      of the fast update cache. A value of -1 indicates that the index will 
+      use the current global size (<literal>gin_fast_limit</>). However, if the
+      value is above -1, the index-specific value will be used.
+      The default is <literal>-1</>.
+     </para>
+     </listitem>
+    </varlistentry>
     </variablelist>
    </refsect2>
  
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 177,182 **** static relopt_int intRelOpts[] =
--- 177,189 ----
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		}, -1, 0, 2000000000
  	},
+ 	{
+ 		{
+ 			"fast_cache_size",
+ 			"Size of the fast gin cache, in kB",
+ 			RELOPT_KIND_GIN
+ 		}, -1, 0, INT_MAX
+ 	},
  	/* list terminator */
  	{{NULL}}
  };
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***************
*** 28,33 ****
--- 28,40 ----
  #define GIN_PAGE_FREESIZE \
  	( BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - MAXALIGN(sizeof(GinPageOpaqueData)) )
  
+ #define GET_RELATION_LIMIT(relation) \
+ 	((relation)->rd_options ? \
+ 	 ((GinOptions *) (relation)->rd_options)->fastCacheSize : (-1))
+ 
+ #define HAS_RELATION_LIMIT(relation) \
+ 	(GET_RELATION_LIMIT(relation) > -1)
+ 
  typedef struct KeyArray
  {
  	Datum	   *keys;			/* expansible array */
***************
*** 426,433 **** ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
  	 *
  	 * ginInsertCleanup() should not be called inside our CRIT_SECTION.
  	 */
! 	if (metadata->nPendingPages * GIN_PAGE_FREESIZE > work_mem * 1024L)
  		needCleanup = true;
  
  	UnlockReleaseBuffer(metabuffer);
  
--- 433,450 ----
  	 *
  	 * ginInsertCleanup() should not be called inside our CRIT_SECTION.
  	 */
! 
! 	if (HAS_RELATION_LIMIT(index))
! 	{
! 		if (metadata->nPendingPages * GIN_PAGE_FREESIZE > GET_RELATION_LIMIT(index) * 1024L)
! 		{
! 			needCleanup = true;
! 		}
! 	} 
! 	else if (metadata->nPendingPages * GIN_PAGE_FREESIZE > gin_fast_limit * 1024L)
! 	{
  		needCleanup = true;
+ 	}
  
  	UnlockReleaseBuffer(metabuffer);
  
*** a/src/backend/access/gin/gininsert.c
--- b/src/backend/access/gin/gininsert.c
***************
*** 586,591 **** gininsert(PG_FUNCTION_ARGS)
--- 586,592 ----
  
  	if (GinGetUseFastUpdate(index))
  	{
+ 		// elog(NOTICE, "using fast update");
  		GinTupleCollector collector;
  
  		memset(&collector, 0, sizeof(GinTupleCollector));
***************
*** 600,605 **** gininsert(PG_FUNCTION_ARGS)
--- 601,607 ----
  	}
  	else
  	{
+ 		// elog(NOTICE, "not using fast update");
  		for (i = 0; i < ginstate.origTupdesc->natts; i++)
  			ginHeapTupleInsert(&ginstate, (OffsetNumber) (i + 1),
  							   values[i], isnull[i],
*** a/src/backend/access/gin/ginutil.c
--- b/src/backend/access/gin/ginutil.c
***************
*** 501,507 **** ginoptions(PG_FUNCTION_ARGS)
  	GinOptions *rdopts;
  	int			numoptions;
  	static const relopt_parse_elt tab[] = {
! 		{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)}
  	};
  
  	options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
--- 501,508 ----
  	GinOptions *rdopts;
  	int			numoptions;
  	static const relopt_parse_elt tab[] = {
! 		{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
! 		{"fast_cache_size", RELOPT_TYPE_INT, offsetof(GinOptions, fastCacheSize)}
  	};
  
  	options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
***************
*** 599,601 **** ginUpdateStats(Relation index, const GinStatsData *stats)
--- 600,640 ----
  
  	END_CRIT_SECTION();
  }
+ 
+ /*
+  * Used to check if the gin fast cache should be used.
+  * Use the gin_fast_limit global to turn off the cache entirely.
+  * The per-index sizes will kick in if the user has set a size and enabled fastCacheSize
+  */
+ bool
+ GinGetUseFastUpdate(Relation relation)
+ {
+ 	if((relation)->rd_options)
+ 	{
+ 		if( ((GinOptions *) (relation)->rd_options)->fastCacheSize > 0
+ 			&& ((GinOptions *) (relation)->rd_options)->useFastUpdate)
+ 		{
+ 			return true;
+ 		} 
+ 		else if(!((GinOptions *) (relation)->rd_options)->useFastUpdate)
+ 		{
+ 			return false;
+ 		}
+ 		else if(gin_fast_limit > 0)
+ 		{
+ 			return true;
+ 		}
+ 		else
+ 		{
+ 			return false;
+ 		}
+ 	} 
+ 	else if(gin_fast_limit > 0)
+ 	{
+ 		return true;
+ 	}
+ 	else
+ 	{
+ 		return false;
+ 	}
+ }
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
***************
*** 99,104 **** int			CTimeZone = 0;
--- 99,105 ----
  bool		enableFsync = true;
  bool		allowSystemTableMods = false;
  int			work_mem = 1024;
+ int			gin_fast_limit = 128;
  int			maintenance_work_mem = 16384;
  
  /*
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 1711,1716 **** static struct config_int ConfigureNamesInt[] =
--- 1711,1728 ----
  	},
  
  	{
+ 		{"gin_fast_limit", PGC_USERSET, RESOURCES_MEM,
+ 			gettext_noop("Sets the maximum memory to be used for fastgin operations."),
+ 			gettext_noop("This represents the maximum amount of space stored in "
+ 						 "the fast gin cache."),
+ 			GUC_UNIT_KB
+ 		},
+ 		&gin_fast_limit,
+ 		128, 0, MAX_KILOBYTES,
+ 		NULL, NULL, NULL
+ 	},
+ 
+ 	{
  		{"maintenance_work_mem", PGC_USERSET, RESOURCES_MEM,
  			gettext_noop("Sets the maximum memory to be used for maintenance operations."),
  			gettext_noop("This includes operations such as VACUUM and CREATE INDEX."),
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 121,126 ****
--- 121,127 ----
  # It is not advisable to set max_prepared_transactions nonzero unless you
  # actively intend to use prepared transactions.
  #work_mem = 1MB				# min 64kB
+ #gin_fast_limit = 128kB         # min 0kB
  #maintenance_work_mem = 16MB		# min 1MB
  #max_stack_depth = 2MB			# min 100kB
  
*** a/src/include/access/gin.h
--- b/src/include/access/gin.h
***************
*** 46,52 **** typedef struct GinStatsData
  	int32		ginVersion;
  } GinStatsData;
  
! /* GUC parameter */
  extern PGDLLIMPORT int GinFuzzySearchLimit;
  
  /* ginutil.c */
--- 46,52 ----
  	int32		ginVersion;
  } GinStatsData;
  
! /* GUC parameters */
  extern PGDLLIMPORT int GinFuzzySearchLimit;
  
  /* ginutil.c */
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
***************
*** 262,275 **** typedef struct GinOptions
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
  	bool		useFastUpdate;	/* use fast updates? */
  } GinOptions;
  
- #define GIN_DEFAULT_USE_FASTUPDATE	true
- #define GinGetUseFastUpdate(relation) \
- 	((relation)->rd_options ? \
- 	 ((GinOptions *) (relation)->rd_options)->useFastUpdate : GIN_DEFAULT_USE_FASTUPDATE)
- 
- 
  /* Macros for buffer lock/unlock operations */
  #define GIN_UNLOCK	BUFFER_LOCK_UNLOCK
  #define GIN_SHARE	BUFFER_LOCK_SHARE
--- 262,270 ----
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
  	bool		useFastUpdate;	/* use fast updates? */
+ 	int32		fastCacheSize;	/* cachesize - independent of work_mem */
  } GinOptions;
  
  /* Macros for buffer lock/unlock operations */
  #define GIN_UNLOCK	BUFFER_LOCK_UNLOCK
  #define GIN_SHARE	BUFFER_LOCK_SHARE
***************
*** 443,448 **** extern int ginCompareAttEntries(GinState *ginstate,
--- 438,444 ----
  extern Datum *ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
  				  Datum value, bool isNull,
  				  int32 *nentries, GinNullCategory **categories);
+ extern bool GinGetUseFastUpdate(Relation relation);
  
  extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
  extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
***************
*** 230,235 **** extern int	CTimeZone;
--- 230,236 ----
  extern bool enableFsync;
  extern bool allowSystemTableMods;
  extern PGDLLIMPORT int work_mem;
+ extern PGDLLIMPORT int gin_fast_limit;
  extern PGDLLIMPORT int maintenance_work_mem;
  
  extern int	VacuumCostPageHit;
#2Kevin Grittner
kgrittn@ymail.com
In reply to: Ian Link (#1)
Re: Patch for fast gin cache performance improvement

Ian Link <ian@ilink.io> wrote:

This patch contains a performance improvement for the fast gin
cache.

Our test queries improve from about 0.9 ms to 0.030 ms.

Impressive!

Thanks for reading and considering this patch!

Congratulations on your first PostgreSQL patch!  To get it
scheduled for review, please add it to this page:

https://commitfest.postgresql.org/action/commitfest_view/open

You will need to get a community login (if you don't already have
one), but that is a quick and painless process.  Choose an
appropriate topic (like "Performance") and reference the message ID
of the email to which you attached the patch.  Don't worry about
the fields for reviewers, committer, or date closed.

Sorry for the administrative overhead, but without it things can
fall through the cracks.  You can find an overview of the review
process with links to more detail here:

http://wiki.postgresql.org/wiki/CommitFest

Thanks for contributing!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Kevin Grittner
kgrittn@ymail.com
In reply to: Kevin Grittner (#2)
Re: Patch for fast gin cache performance improvement

Kevin Grittner <kgrittn@ymail.com> wrote:

You will need to get a community login (if you don't already have
one), but that is a quick and painless process.

Oops -- we seem to have a problem with new community logins at the
moment, which will hopefully be straightened out soon.  You might
want to wait a few days if you don't already have a login.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Ian Link
ian@ilink.io
In reply to: Kevin Grittner (#3)
2 attachment(s)
Re: Patch for fast gin cache performance improvement

No worries! I'll just wait until it's up again.
Thanks
Ian

Show quoted text

Kevin Grittner <mailto:kgrittn@ymail.com>
Tuesday, June 18, 2013 9:15 AM

Oops -- we seem to have a problem with new community logins at the
moment, which will hopefully be straightened out soon. You might
want to wait a few days if you don't already have a login.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Kevin Grittner <mailto:kgrittn@ymail.com>
Tuesday, June 18, 2013 9:09 AM
Ian Link<ian@ilink.io> wrote:

This patch contains a performance improvement for the fast gin
cache.

Our test queries improve from about 0.9 ms to 0.030 ms.

Impressive!

Thanks for reading and considering this patch!

Congratulations on your first PostgreSQL patch! To get it
scheduled for review, please add it to this page:

https://commitfest.postgresql.org/action/commitfest_view/open

You will need to get a community login (if you don't already have
one), but that is a quick and painless process. Choose an
appropriate topic (like "Performance") and reference the message ID
of the email to which you attached the patch. Don't worry about
the fields for reviewers, committer, or date closed.

Sorry for the administrative overhead, but without it things can
fall through the cracks. You can find an overview of the review
process with links to more detail here:

http://wiki.postgresql.org/wiki/CommitFest

Thanks for contributing!

Ian Link <mailto:ian@ilink.io>
Monday, June 17, 2013 9:42 PM
*

This patch contains a performance improvement for the fast gin cache.
As you may know, the performance of the fast gin cache decreases with
its size. Currently, the size of the fast gin cache is tied to
work_mem. The size of work_mem can often be quite high. The large size
of work_mem is inappropriate for the fast gin cache size. Therefore,
we created a separate cache size called gin_fast_limit. This global
variable controls the size of the fast gin cache, independently of
work_mem. Currently, the default gin_fast_limit is set to 128kB.
However, that value could need tweaking. 64kB may work better, but
it's hard to say with only my single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries
improve from about 0.9 ms to 0.030 ms. Please feel free to use the
test case yourself: it should be attached. I can look into additional
test cases (tsvectors) if anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that
it is disabled. If the user does not specify a per-index limit, the
index will simply use the global limit.

I would like to thank Andrew Gierth for all his help on this patch. As
this is my first patch he was extremely helpful. The idea for this
performance improvement was entirely his. I just did the
implementation. Thanks for reading and considering this patch!*

Ian Link

Attachments:

postbox-contact.jpgimage/jpeg; name=postbox-contact.jpg; x-apple-mail-type=stationeryDownload
compose-unknown-contact.jpgimage/jpeg; name=compose-unknown-contact.jpg; x-apple-mail-type=stationeryDownload
#5ian link
ian@ilink.io
In reply to: Ian Link (#4)
2 attachment(s)
Re: Patch for fast gin cache performance improvement

Looks like my community login is still not working. No rush, just wanted to
let you know. Thanks!

Ian

On Tue, Jun 18, 2013 at 11:41 AM, Ian Link <ian@ilink.io> wrote:

Show quoted text

No worries! I'll just wait until it's up again.
Thanks
Ian

Kevin Grittner <kgrittn@ymail.com>
Tuesday, June 18, 2013 9:15 AM

Oops -- we seem to have a problem with new community logins at the
moment, which will hopefully be straightened out soon. You might
want to wait a few days if you don't already have a login.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com>
Tuesday, June 18, 2013 9:09 AM

Ian Link <ian@ilink.io> <ian@ilink.io> wrote:

This patch contains a performance improvement for the fast gin
cache.

Our test queries improve from about 0.9 ms to 0.030 ms.

Impressive!

Thanks for reading and considering this patch!

Congratulations on your first PostgreSQL patch! To get it
scheduled for review, please add it to this page:
https://commitfest.postgresql.org/action/commitfest_view/open

You will need to get a community login (if you don't already have
one), but that is a quick and painless process. Choose an
appropriate topic (like "Performance") and reference the message ID
of the email to which you attached the patch. Don't worry about
the fields for reviewers, committer, or date closed.

Sorry for the administrative overhead, but without it things can
fall through the cracks. You can find an overview of the review
process with links to more detail here:

http://wiki.postgresql.org/wiki/CommitFest

Thanks for contributing!

Ian Link <ian@ilink.io>
Monday, June 17, 2013 9:42 PM
*

This patch contains a performance improvement for the fast gin cache. As
you may know, the performance of the fast gin cache decreases with its
size. Currently, the size of the fast gin cache is tied to work_mem. The
size of work_mem can often be quite high. The large size of work_mem is
inappropriate for the fast gin cache size. Therefore, we created a separate
cache size called gin_fast_limit. This global variable controls the size of
the fast gin cache, independently of work_mem. Currently, the default
gin_fast_limit is set to 128kB. However, that value could need tweaking.
64kB may work better, but it's hard to say with only my single machine to
test on.

On my machine, this patch results in a nice speed up. Our test queries
improve from about 0.9 ms to 0.030 ms. Please feel free to use the test
case yourself: it should be attached. I can look into additional test cases
(tsvectors) if anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it is
disabled. If the user does not specify a per-index limit, the index will
simply use the global limit.

I would like to thank Andrew Gierth for all his help on this patch. As
this is my first patch he was extremely helpful. The idea for this
performance improvement was entirely his. I just did the implementation.
Thanks for reading and considering this patch!*

Ian Link

Attachments:

postbox-contact.jpgimage/jpeg; name=postbox-contact.jpg; x-apple-mail-type=stationeryDownload
compose-unknown-contact.jpgimage/jpeg; name=compose-unknown-contact.jpg; x-apple-mail-type=stationeryDownload
#6Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: ian link (#5)
Re: Patch for fast gin cache performance improvement

On 06/23/2013 04:03 AM, ian link wrote:

Looks like my community login is still not working. No rush, just wanted
to let you know. Thanks!

have you tried to log in once to the main website per:

/messages/by-id/CABUevEyt9tQfcF7T2Uzcr8WeF9M=s8qSACuCmN5L2Et26=RTjw@mail.gmail.com

?

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Ian Link
ian@ilink.io
In reply to: Stefan Kaltenbrunner (#6)
2 attachment(s)
Re: Patch for fast gin cache performance improvement

I just tried it and my account works now. Thanks!
Ian

Show quoted text

Stefan Kaltenbrunner <mailto:stefan@kaltenbrunner.cc>
Sunday, June 23, 2013 2:05 AM

have you tried to log in once to the main website per:

/messages/by-id/CABUevEyt9tQfcF7T2Uzcr8WeF9M=s8qSACuCmN5L2Et26=RTjw@mail.gmail.com

?

Stefan
ian link <mailto:ian@ilink.io>
Saturday, June 22, 2013 7:03 PM
Looks like my community login is still not working. No rush, just
wanted to let you know. Thanks!

Ian

Ian Link <mailto:ian@ilink.io>
Tuesday, June 18, 2013 11:41 AM

No worries! I'll just wait until it's up again.
Thanks
Ian
Kevin Grittner <mailto:kgrittn@ymail.com>
Tuesday, June 18, 2013 9:15 AM

Oops -- we seem to have a problem with new community logins at the
moment, which will hopefully be straightened out soon. You might
want to wait a few days if you don't already have a login.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Kevin Grittner <mailto:kgrittn@ymail.com>
Tuesday, June 18, 2013 9:09 AM
Ian Link<ian@ilink.io> wrote:

This patch contains a performance improvement for the fast gin
cache.

Our test queries improve from about 0.9 ms to 0.030 ms.

Impressive!

Thanks for reading and considering this patch!

Congratulations on your first PostgreSQL patch! To get it
scheduled for review, please add it to this page:

https://commitfest.postgresql.org/action/commitfest_view/open

You will need to get a community login (if you don't already have
one), but that is a quick and painless process. Choose an
appropriate topic (like "Performance") and reference the message ID
of the email to which you attached the patch. Don't worry about
the fields for reviewers, committer, or date closed.

Sorry for the administrative overhead, but without it things can
fall through the cracks. You can find an overview of the review
process with links to more detail here:

http://wiki.postgresql.org/wiki/CommitFest

Thanks for contributing!

Attachments:

compose-unknown-contact.jpgimage/jpeg; name=compose-unknown-contact.jpg; x-apple-mail-type=stationeryDownload
postbox-contact.jpgimage/jpeg; name=postbox-contact.jpg; x-apple-mail-type=stationeryDownload
#8Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Ian Link (#1)
Re: Patch for fast gin cache performance improvement

Hi Ian,

This patch contains a performance improvement for the fast gin cache. As you
may know, the performance of the fast gin cache decreases with its size.
Currently, the size of the fast gin cache is tied to work_mem. The size of
work_mem can often be quite high. The large size of work_mem is inappropriate
for the fast gin cache size. Therefore, we created a separate cache size

called

gin_fast_limit. This global variable controls the size of the fast gin cache,
independently of work_mem. Currently, the default gin_fast_limit is set to

128kB.

However, that value could need tweaking. 64kB may work better, but it's hard
to say with only my single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries improve
from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
it should be attached. I can look into additional test cases (tsvectors) if
anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it is
disabled. If the user does not specify a per-index limit, the index will

simply

use the global limit.

I had a look over this patch. I think this patch is interesting and very
useful. Here are my review points:

1. Patch applies cleanly.
2. make, make install and make check is good.
3. I did performance evaluation using your test queries with 64kB and 128kB of
gin_fast_limit (or fast_cache_size), and saw that both values achieved the
performance gains over gin_fast_limit = '256MB'. 64kB worked better than 128kB.
64kB improved from 1.057 ms to 0.075 ms. Great!
4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance. Am I right? If so, I think the tradeoff should be
noted in the documentation.
5. The following documents in Chapter 57. GIN Indexes need to be updated:
* 57.3.1. GIN Fast Update Technique
* 57.4. GIN Tips and Tricks
6. I would like to see the results for the additional test cases (tsvectors).
7. The commented-out elog() code should be removed.
8. I think there are no issues in this patch. However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry for the delay.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#8)
Re: Patch for fast gin cache performance improvement

I wrote:

I had a look over this patch. I think this patch is interesting and very

useful.

Here are my review points:

8. I think there are no issues in this patch. However, I have one question:
how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In
this case, in my understanding, this patch inserts new entries into the

pending

list temporarily and immediately moves them to the main GIN data structure

using

ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry, There are incorrect expressions. I mean gin_fast_limit > 0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users. So I'd like to propose to introduce only one parameter:
fast_cache_size. While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value. What do you think about this?

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Ian Link
ian@ilink.io
In reply to: Etsuro Fujita (#9)
1 attachment(s)
Re: Patch for fast gin cache performance improvement

Hi Etsuro,
Sorry for the delay but I have been very busy with work. I have been
away from postgres for a while, so I will need a little time to review
the code and make sure I give you an informed response. I'll get back to
you as soon as I am able. Thanks for understanding.
Ian Link

Show quoted text

Etsuro Fujita <mailto:fujita.etsuro@lab.ntt.co.jp>
Friday, September 27, 2013 2:24 AM
I wrote:

I had a look over this patch. I think this patch is interesting and very

useful.

Here are my review points:

8. I think there are no issues in this patch. However, I have one question:
how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In
this case, in my understanding, this patch inserts new entries into the

pending

list temporarily and immediately moves them to the main GIN data structure

using

ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry, There are incorrect expressions. I mean gin_fast_limit> 0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users. So I'd like to propose to introduce only one parameter:
fast_cache_size. While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value. What do you think about this?

Thanks,

Best regards,
Etsuro Fujita

Etsuro Fujita <mailto:fujita.etsuro@lab.ntt.co.jp>
Thursday, September 26, 2013 6:02 AM
Hi Ian,

This patch contains a performance improvement for the fast gin cache. As you
may know, the performance of the fast gin cache decreases with its size.
Currently, the size of the fast gin cache is tied to work_mem. The size of
work_mem can often be quite high. The large size of work_mem is inappropriate
for the fast gin cache size. Therefore, we created a separate cache size

called

gin_fast_limit. This global variable controls the size of the fast gin cache,
independently of work_mem. Currently, the default gin_fast_limit is set to

128kB.

However, that value could need tweaking. 64kB may work better, but it's hard
to say with only my single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries improve
from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
it should be attached. I can look into additional test cases (tsvectors) if
anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it is
disabled. If the user does not specify a per-index limit, the index will

simply

use the global limit.

I had a look over this patch. I think this patch is interesting and very
useful. Here are my review points:

1. Patch applies cleanly.
2. make, make install and make check is good.
3. I did performance evaluation using your test queries with 64kB and 128kB of
gin_fast_limit (or fast_cache_size), and saw that both values achieved the
performance gains over gin_fast_limit = '256MB'. 64kB worked better than 128kB.
64kB improved from 1.057 ms to 0.075 ms. Great!
4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance. Am I right? If so, I think the tradeoff should be
noted in the documentation.
5. The following documents in Chapter 57. GIN Indexes need to be updated:
* 57.3.1. GIN Fast Update Technique
* 57.4. GIN Tips and Tricks
6. I would like to see the results for the additional test cases (tsvectors).
7. The commented-out elog() code should be removed.
8. I think there are no issues in this patch. However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry for the delay.

Best regards,
Etsuro Fujita

Ian Link <mailto:ian@ilink.io>
Monday, June 17, 2013 9:42 PM
*

This patch contains a performance improvement for the fast gin cache.
As you may know, the performance of the fast gin cache decreases with
its size. Currently, the size of the fast gin cache is tied to
work_mem. The size of work_mem can often be quite high. The large size
of work_mem is inappropriate for the fast gin cache size. Therefore,
we created a separate cache size called gin_fast_limit. This global
variable controls the size of the fast gin cache, independently of
work_mem. Currently, the default gin_fast_limit is set to 128kB.
However, that value could need tweaking. 64kB may work better, but
it's hard to say with only my single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries
improve from about 0.9 ms to 0.030 ms. Please feel free to use the
test case yourself: it should be attached. I can look into additional
test cases (tsvectors) if anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that
it is disabled. If the user does not specify a per-index limit, the
index will simply use the global limit.

I would like to thank Andrew Gierth for all his help on this patch. As
this is my first patch he was extremely helpful. The idea for this
performance improvement was entirely his. I just did the
implementation. Thanks for reading and considering this patch!*

Ian Link

Attachments:

compose-unknown-contact.jpgimage/jpeg; name=compose-unknown-contact.jpg; x-apple-mail-type=stationeryDownload
#11Ian Link
ian@ilink.io
In reply to: Ian Link (#10)
Re: Patch for fast gin cache performance improvement

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users. So I'd like to propose to introduce only one parameter:
fast_cache_size. While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value. What do you think about this?

I think it makes sense to maintain this separation. If the user doesn't
need a per-index setting, they don't have to use the parameter. Since
the parameter is off by default, they don't even need to worry about it.
There might as well be one parameter for users that don't need
fine-grained control. We can document this and I don't think it will be
confusing to the user.

4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance. Am I right? If so, I think the tradeoff should be
noted in the documentation.

I believe this is correct.

5. The following documents in Chapter 57. GIN Indexes need to be
updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and Tricks

Sure, I can add something.

6. I would like to see the results for the additional test cases (tsvectors).

I don't really have any good test cases for this available, and have
very limited time for postgres at the moment. Feel free to create a test
case, but I don't believe I can at the moment. Sorry!

7. The commented-out elog() code should be removed.

Sorry about that, I shouldn't have submitted the patch with those still
there.

I should have a new patch soonish, hopefully. Thanks for your feedback!
Ian

Ian Link wrote:

8. I think there are no issues in this patch. However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Ian Link (#11)
Re: Patch for fast gin cache performance improvement

Ian Link wrote:

Although I asked this question, I've reconsidered about these
parameters, and it seems that these parameters not only make code
rather complex but are a little confusing to users. So I'd like to propose

to introduce only one parameter:

fast_cache_size. While users that give weight to update performance
for the fast update technique set this parameter to a large value,
users that give weight not only to update performance but to search
performance set this parameter to a small value. What do you think about

this?
I think it makes sense to maintain this separation. If the user doesn't need
a per-index setting, they don't have to use the parameter. Since the parameter
is off by default, they don't even need to worry about it.
There might as well be one parameter for users that don't need fine-grained
control. We can document this and I don't think it will be confusing to the
user.

OK, though I'd like to hear the opinion of others.

4. In my understanding, the small value of
gin_fast_limit/fast_cache_size leads to the increase in GIN search
performance, which, however, leads to the decrease in GIN update
performance. Am I right? If so, I think the tradeoff should be noted in

the documentation.
I believe this is correct.

5. The following documents in Chapter 57. GIN Indexes need to be
updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
Tricks

Sure, I can add something.

6. I would like to see the results for the additional test cases

(tsvectors).

I don't really have any good test cases for this available, and have very

limited

time for postgres at the moment. Feel free to create a test case, but I don't
believe I can at the moment. Sorry!

Unfortunately, I don't have much time to do such a thing, though I think we
should do that. (In addition, we should do another performance test to make
clear an influence of fast update performance from introducing these parameters,
which would be required to determine the default values of these parameters.)

7. The commented-out elog() code should be removed.

Sorry about that, I shouldn't have submitted the patch with those still there.

I should have a new patch soonish, hopefully. Thanks for your feedback!

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above. So I'd like to mark
this patch as Returned with Feedback. Is it OK?

Thanks,

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Ian Link
ian@ilink.io
In reply to: Etsuro Fujita (#12)
1 attachment(s)
Re: Patch for fast gin cache performance improvement

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above. So I'd like to mark
this patch as Returned with Feedback. Is it OK?

Sounds like a good idea. Thanks for the review!
Ian Link

Show quoted text

Etsuro Fujita <mailto:fujita.etsuro@lab.ntt.co.jp>
Thursday, October 10, 2013 1:01 AM
Ian Link wrote:

Although I asked this question, I've reconsidered about these
parameters, and it seems that these parameters not only make code
rather complex but are a little confusing to users. So I'd like to propose

to introduce only one parameter:

fast_cache_size. While users that give weight to update performance
for the fast update technique set this parameter to a large value,
users that give weight not only to update performance but to search
performance set this parameter to a small value. What do you think about

this?
I think it makes sense to maintain this separation. If the user doesn't need
a per-index setting, they don't have to use the parameter. Since the parameter
is off by default, they don't even need to worry about it.
There might as well be one parameter for users that don't need fine-grained
control. We can document this and I don't think it will be confusing to the
user.

OK, though I'd like to hear the opinion of others.

4. In my understanding, the small value of
gin_fast_limit/fast_cache_size leads to the increase in GIN search
performance, which, however, leads to the decrease in GIN update
performance. Am I right? If so, I think the tradeoff should be noted in

the documentation.
I believe this is correct.

5. The following documents in Chapter 57. GIN Indexes need to be
updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
Tricks

Sure, I can add something.

6. I would like to see the results for the additional test cases

(tsvectors).

I don't really have any good test cases for this available, and have very

limited

time for postgres at the moment. Feel free to create a test case, but I don't
believe I can at the moment. Sorry!

Unfortunately, I don't have much time to do such a thing, though I think we
should do that. (In addition, we should do another performance test to make
clear an influence of fast update performance from introducing these parameters,
which would be required to determine the default values of these parameters.)

7. The commented-out elog() code should be removed.

Sorry about that, I shouldn't have submitted the patch with those still there.

I should have a new patch soonish, hopefully. Thanks for your feedback!

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above. So I'd like to mark
this patch as Returned with Feedback. Is it OK?

Thanks,

Best regards,
Etsuro Fujita

Ian Link <mailto:ian@ilink.io>
Monday, September 30, 2013 3:09 PM
Hi Etsuro,
Sorry for the delay but I have been very busy with work. I have been
away from postgres for a while, so I will need a little time to review
the code and make sure I give you an informed response. I'll get back
to you as soon as I am able. Thanks for understanding.
Ian Link

Etsuro Fujita <mailto:fujita.etsuro@lab.ntt.co.jp>
Friday, September 27, 2013 2:24 AM
I wrote:

I had a look over this patch. I think this patch is interesting and very

useful.

Here are my review points:

8. I think there are no issues in this patch. However, I have one question:
how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In
this case, in my understanding, this patch inserts new entries into the

pending

list temporarily and immediately moves them to the main GIN data structure

using

ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry, There are incorrect expressions. I mean gin_fast_limit> 0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users. So I'd like to propose to introduce only one parameter:
fast_cache_size. While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value. What do you think about this?

Thanks,

Best regards,
Etsuro Fujita

Etsuro Fujita <mailto:fujita.etsuro@lab.ntt.co.jp>
Thursday, September 26, 2013 6:02 AM
Hi Ian,

This patch contains a performance improvement for the fast gin cache. As you
may know, the performance of the fast gin cache decreases with its size.
Currently, the size of the fast gin cache is tied to work_mem. The size of
work_mem can often be quite high. The large size of work_mem is inappropriate
for the fast gin cache size. Therefore, we created a separate cache size

called

gin_fast_limit. This global variable controls the size of the fast gin cache,
independently of work_mem. Currently, the default gin_fast_limit is set to

128kB.

However, that value could need tweaking. 64kB may work better, but it's hard
to say with only my single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries improve
from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
it should be attached. I can look into additional test cases (tsvectors) if
anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it is
disabled. If the user does not specify a per-index limit, the index will

simply

use the global limit.

I had a look over this patch. I think this patch is interesting and very
useful. Here are my review points:

1. Patch applies cleanly.
2. make, make install and make check is good.
3. I did performance evaluation using your test queries with 64kB and 128kB of
gin_fast_limit (or fast_cache_size), and saw that both values achieved the
performance gains over gin_fast_limit = '256MB'. 64kB worked better than 128kB.
64kB improved from 1.057 ms to 0.075 ms. Great!
4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance. Am I right? If so, I think the tradeoff should be
noted in the documentation.
5. The following documents in Chapter 57. GIN Indexes need to be updated:
* 57.3.1. GIN Fast Update Technique
* 57.4. GIN Tips and Tricks
6. I would like to see the results for the additional test cases (tsvectors).
7. The commented-out elog() code should be removed.
8. I think there are no issues in this patch. However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry for the delay.

Best regards,
Etsuro Fujita

Ian Link <mailto:ian@ilink.io>
Monday, June 17, 2013 9:42 PM
*

This patch contains a performance improvement for the fast gin cache.
As you may know, the performance of the fast gin cache decreases with
its size. Currently, the size of the fast gin cache is tied to
work_mem. The size of work_mem can often be quite high. The large size
of work_mem is inappropriate for the fast gin cache size. Therefore,
we created a separate cache size called gin_fast_limit. This global
variable controls the size of the fast gin cache, independently of
work_mem. Currently, the default gin_fast_limit is set to 128kB.
However, that value could need tweaking. 64kB may work better, but
it's hard to say with only my single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries
improve from about 0.9 ms to 0.030 ms. Please feel free to use the
test case yourself: it should be attached. I can look into additional
test cases (tsvectors) if anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that
it is disabled. If the user does not specify a per-index limit, the
index will simply use the global limit.

I would like to thank Andrew Gierth for all his help on this patch. As
this is my first patch he was extremely helpful. The idea for this
performance improvement was entirely his. I just did the
implementation. Thanks for reading and considering this patch!*

Ian Link

Attachments:

compose-unknown-contact.jpgimage/jpeg; name=compose-unknown-contact.jpg; x-apple-mail-type=stationeryDownload