WIP: preloading of ispell dictionary
Hello
I wrote some small patch, that allow preloading of selected ispell
dictionary. It solve the problem with slow tsearch initialisation with
some language configuration.
This patch is most simple - simpler than variant with shared memory
and it is usable on Linux platform.
I registered some issues about access to different king of memory :(.
The local memory is the best, than shared_memory and then virtual
memory. Queries with preloaded dictionary are about 20% slower (but
still good enough). It depend on platform (and language sure) - I
afraid so this module doesn't help on MS Windows.
Tested on 64bit Fedora Linux - probably on 32bit these issues will be smaller.
I would to add this patch to next commitfest.
can somebody test it for different platforms and different languages than Czech?
Regards
Pavel Stehule
Attachments:
preload.diffapplication/octet-stream; name=preload.diffDownload
*** ./contrib/dict_preload/dict_preload.c.orig 2010-03-18 17:00:33.281409707 +0100
--- ./contrib/dict_preload/dict_preload.c 2010-03-19 11:09:29.870831600 +0100
***************
*** 0 ****
--- 1,186 ----
+ /*-------------------------------------------------------------------------
+ *
+ * dict_preload.c
+ * preloaded dictionary
+ *
+ * Copyright (c) 2007-2010, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * $PostgreSQL: pgsql/contrib/dict_preload/dict_preload.c,v 1.6 2010/01/02 16:57:32 momjian Exp $
+ *
+ *-------------------------------------------------------------------------
+ */
+ #include "postgres.h"
+
+ #include "commands/defrem.h"
+ #include "fmgr.h"
+ #include "miscadmin.h"
+ #include "nodes/makefuncs.h"
+ #include "nodes/value.h"
+ #include "tsearch/ts_public.h"
+ #include "tsearch/ts_utils.h"
+ #include "tsearch/dicts/spell.h"
+ #include "utils/guc.h"
+ #include "utils/memutils.h"
+
+ PG_MODULE_MAGIC;
+
+ char *preload_dictfile = NULL;
+ char *preload_afffile = NULL;
+ char *preload_stopwords = NULL;
+
+ typedef struct
+ {
+ StopList stoplist;
+ IspellDict obj;
+ } DictISpell;
+
+ MemoryContext preload_ctx = NULL;
+
+ DictISpell *preload_dict = NULL;
+
+ PG_FUNCTION_INFO_V1(dpreloaddict_init);
+ Datum dpreloaddict_init(PG_FUNCTION_ARGS);
+
+ PG_FUNCTION_INFO_V1(dpreloaddict_lexize);
+ Datum dpreloaddict_lexize(PG_FUNCTION_ARGS);
+
+ void _PG_init(void);
+ void _PG_fini(void);
+
+ static DictISpell *
+ load_dictionary(void)
+ {
+ List *dictopt = NIL;
+ FunctionCallInfoData fcinfo;
+
+ /*
+ * read parameters for preloaded dictionary
+ */
+ if (preload_dictfile != NULL)
+ dictopt = lappend(dictopt, makeDefElem("DictFile",
+ (Node *) makeString(preload_dictfile)));
+ if (preload_afffile != NULL)
+ dictopt = lappend(dictopt, makeDefElem("AffFile",
+ (Node *) makeString(preload_afffile)));
+ if (preload_stopwords != NULL)
+ dictopt = lappend(dictopt, makeDefElem("StopWords",
+ (Node *) makeString(preload_stopwords)));
+
+ /*
+ * Initialise ispell dictionary
+ */
+ InitFunctionCallInfoData(fcinfo, NULL, 1, NULL, NULL);
+ fcinfo.arg[0] = PointerGetDatum(dictopt);
+ fcinfo.argnull[0] = false;
+
+ return (DictISpell *) DatumGetPointer(dispell_init(&fcinfo));
+ }
+
+ Datum
+ dpreloaddict_init(PG_FUNCTION_ARGS)
+ {
+ static bool firsttime = true;
+
+ /*
+ * dpreloaddict_init can be called more times:
+ * CREATE TEXT SEARCH DICTIONARY
+ * DROP TEXT SEARCH DICTIONARY
+ * CREATE TEXT SEARCH DICTIONARY
+ * ...
+ */
+ if (firsttime)
+ {
+ /* In this moment, dictionary have to be loaded */
+ Assert(MemoryContextIsValid(preload_ctx));
+ Assert(preload_dict != NULL);
+
+ /* join preloaded context to current context */
+ preload_ctx->parent = CurrentMemoryContext;
+ preload_ctx->nextchild = CurrentMemoryContext->firstchild;
+ CurrentMemoryContext->firstchild = preload_ctx;
+ preload_ctx = NULL;
+ firsttime = false;
+
+ return PointerGetDatum(preload_dict);
+ }
+ else
+ return PointerGetDatum(load_dictionary());
+ }
+
+ Datum
+ dpreloaddict_lexize(PG_FUNCTION_ARGS)
+ {
+ return dispell_lexize(fcinfo);
+ }
+
+ /*
+ * Module load callback
+ */
+ void
+ _PG_init()
+ {
+ MemoryContext oldctx;
+ static bool inited = false;
+ GucContext guc_ctx;
+
+ if (inited)
+ return;
+ else
+ inited = true;
+
+ guc_ctx = process_shared_preload_libraries_in_progress ?
+ PGC_POSTMASTER : PGC_SUSET;
+
+ /* Define custom GUC variables. */
+ DefineCustomStringVariable("dict_preload.dictfile",
+ "name of file of preloaded ispell dictionary",
+ NULL,
+ &preload_dictfile,
+ NULL,
+ guc_ctx, 0,
+ NULL, NULL);
+
+ /* Define custom GUC variables. */
+ DefineCustomStringVariable("dict_preload.afffile",
+ "name of file of preloaded ispell affix",
+ NULL,
+ &preload_afffile,
+ NULL,
+ guc_ctx, 0,
+ NULL, NULL);
+
+ /* Define custom GUC variables. */
+ DefineCustomStringVariable("dict_preload.stopwords",
+ "name of file of preloaded ispell stopwords",
+ NULL,
+ &preload_stopwords,
+ NULL,
+ guc_ctx, 0,
+ NULL, NULL);
+
+ /* preload dictionary */
+ Assert(preload_ctx == NULL);
+ Assert(preload_dict == NULL);
+
+ preload_ctx = AllocSetContextCreate(NULL, "Ispell dictionary preload context",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldctx = MemoryContextSwitchTo(preload_ctx);
+
+ preload_dict = load_dictionary();
+
+ MemoryContextSwitchTo(oldctx);
+ }
+
+ /*
+ * Module unload callback
+ */
+ void
+ _PG_fini(void)
+ {
+ /* when preload context exists still delete it */
+ if (preload_ctx != NULL)
+ MemoryContextDelete(preload_ctx);
+ }
*** ./contrib/dict_preload/dict_preload.sql.in.orig 2010-03-18 17:00:52.317502927 +0100
--- ./contrib/dict_preload/dict_preload.sql.in 2010-03-19 08:24:33.195954842 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* $PostgreSQL: pgsql/contrib/dict_int/dict_int.sql.in,v 1.3 2007/11/13 04:24:27 momjian Exp $ */
+
+ -- Adjust this setting to control where the objects get created.
+ SET search_path = public;
+
+ CREATE OR REPLACE FUNCTION dpreloaddict_init(internal)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT;
+
+ CREATE OR REPLACE FUNCTION dpreloaddict_lexize(internal, internal, internal, internal)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT;
+
+ CREATE TEXT SEARCH TEMPLATE preloaddict(
+ LEXIZE = dpreloaddict_lexize,
+ INIT = dpreloaddict_init
+ );
*** ./contrib/dict_preload/uninstall_dict_preload.sql.orig 2010-03-18 17:00:58.039409567 +0100
--- ./contrib/dict_preload/uninstall_dict_preload.sql 2010-03-18 13:52:49.064472194 +0100
***************
*** 0 ****
--- 1,10 ----
+ /* $PostgreSQL: pgsql/contrib/dict_int/uninstall_dict_int.sql,v 1.3 2007/11/13 04:24:27 momjian Exp $ */
+
+ -- Adjust this setting to control where the objects get dropped.
+ SET search_path = public;
+
+ DROP TEXT SEARCH TEMPLATE preloaddict_template CASCADE;
+
+ DROP FUNCTION dpreloaddict_init(internal);
+
+ DROP FUNCTION dpreloaddict_lexize(internal,internal,internal,internal);
*** ./doc/src/sgml/contrib.sgml.orig 2010-01-29 00:59:52.000000000 +0100
--- ./doc/src/sgml/contrib.sgml 2010-03-19 10:35:50.203430470 +0100
***************
*** 89,94 ****
--- 89,95 ----
&cube;
&dblink;
&dict-int;
+ &dict-preload
&dict-xsyn;
&earthdistance;
&fuzzystrmatch;
*** ./doc/src/sgml/dict-preload.sgml.orig 2010-03-19 10:19:04.557472568 +0100
--- ./doc/src/sgml/dict-preload.sgml 2010-03-19 10:42:29.856554695 +0100
***************
*** 0 ****
--- 1,56 ----
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/dict-int.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
+
+ <sect1 id="dict-preload">
+ <title>dict_preload</title>
+
+ <indexterm zone="dict-preload">
+ <primary>dict_preload</primary>
+ </indexterm>
+
+ <para>
+ <filename>dict_preload</> is an example of an add-on dictionary template
+ for full-text search. The motivation for this example dictionary is to
+ allow preloading some ispell dictionary.
+ </para>
+
+ <sect2>
+ <title>Configuration</title>
+
+ <para>
+ you have to modify <literal>custom_variable_classes</> and specify
+ <literal>dict_preload.dictfile</>, <literal>dict_preload.afffile</>
+ and <literal>dict_preload.stopwords</>. Ensure <literal>shared_preload_libraries</>
+ contains <literal>dict_preload</>.
+
+ <programlisting>
+ postgres=# CREATE TEXT SEARCH DICTIONARY cspell (template=preloaddict);
+ CREATE TEXT SEARCH DICTIONARY
+
+ postgres=# CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
+ CREATE TEXT SEARCH CONFIGURATION
+ Time: 18,915 ms
+
+ postgres=# ALTER TEXT SEARCH CONFIGURATION cs
+ postgres-# ALTER MAPPING FOR word, asciiword WITH cspell, simple;
+ ALTER TEXT SEARCH CONFIGURATION
+
+ postgres=# select * from ts_debug('cs','vody');
+ alias | description | token | dictionaries | dictionary | lexemes
+ -----------+-------------------+-------+-----------------+------------+---------
+ asciiword | Word, all ASCII | vody | {cspell,simple} | cspell | {voda}
+ (9 rows)
+ </programlisting>
+
+ <programlisting>
+ postgres=# SHOW ALL;
+ ...
+ dict_preload.afffile | czech
+ dict_preload.dictfile | czech
+ dict_preload.stopwords | czech
+ ...
+ </programlisting>
+
+ </para>
+ </sect2>
+
+ </sect1>
*** ./doc/src/sgml/filelist.sgml.orig 2010-02-22 12:47:30.000000000 +0100
--- ./doc/src/sgml/filelist.sgml 2010-03-19 10:36:59.340555069 +0100
***************
*** 101,106 ****
--- 101,107 ----
<!entity cube SYSTEM "cube.sgml">
<!entity dblink SYSTEM "dblink.sgml">
<!entity dict-int SYSTEM "dict-int.sgml">
+ <!entity dict-preload SYSTEM "dict-preload.sgml">
<!entity dict-xsyn SYSTEM "dict-xsyn.sgml">
<!entity earthdistance SYSTEM "earthdistance.sgml">
<!entity fuzzystrmatch SYSTEM "fuzzystrmatch.sgml">
*** ./src/backend/tsearch/regis.c.orig 2010-01-02 17:57:53.000000000 +0100
--- ./src/backend/tsearch/regis.c 2010-03-18 16:41:03.027708600 +0100
***************
*** 71,88 ****
}
static RegisNode *
! newRegisNode(RegisNode *prev, int len)
{
RegisNode *ptr;
! ptr = (RegisNode *) palloc0(RNHDRSZ + len + 1);
if (prev)
prev->next = ptr;
return ptr;
}
void
! RS_compile(Regis *r, bool issuffix, const char *str)
{
int len = strlen(str);
int state = RS_IN_WAIT;
--- 71,89 ----
}
static RegisNode *
! newRegisNode(RegisNode *prev, int len, MemoryContext simple_ctx)
{
RegisNode *ptr;
! ptr = (RegisNode *) MemoryContextAllocZero(simple_ctx,
! RNHDRSZ + len + 1);
if (prev)
prev->next = ptr;
return ptr;
}
void
! RS_compile(Regis *r, bool issuffix, const char *str, MemoryContext simple_ctx)
{
int len = strlen(str);
int state = RS_IN_WAIT;
***************
*** 99,107 ****
if (t_isalpha(c))
{
if (ptr)
! ptr = newRegisNode(ptr, len);
else
! ptr = r->node = newRegisNode(NULL, len);
COPYCHAR(ptr->data, c);
ptr->type = RSF_ONEOF;
ptr->len = pg_mblen(c);
--- 100,108 ----
if (t_isalpha(c))
{
if (ptr)
! ptr = newRegisNode(ptr, len, simple_ctx);
else
! ptr = r->node = newRegisNode(NULL, len, simple_ctx);
COPYCHAR(ptr->data, c);
ptr->type = RSF_ONEOF;
ptr->len = pg_mblen(c);
***************
*** 109,117 ****
else if (t_iseq(c, '['))
{
if (ptr)
! ptr = newRegisNode(ptr, len);
else
! ptr = r->node = newRegisNode(NULL, len);
ptr->type = RSF_ONEOF;
state = RS_IN_ONEOF;
}
--- 110,118 ----
else if (t_iseq(c, '['))
{
if (ptr)
! ptr = newRegisNode(ptr, len, simple_ctx);
else
! ptr = r->node = newRegisNode(NULL, len, simple_ctx);
ptr->type = RSF_ONEOF;
state = RS_IN_ONEOF;
}
*** ./src/backend/utils/mmgr/Makefile.orig 2010-03-18 15:15:23.063794982 +0100
--- ./src/backend/utils/mmgr/Makefile 2010-03-18 15:15:29.040796648 +0100
***************
*** 12,17 ****
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
! OBJS = aset.o mcxt.o portalmem.o
include $(top_srcdir)/src/backend/common.mk
--- 12,17 ----
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
! OBJS = aset.o mcxt.o portalmem.o simple_alloc.o
include $(top_srcdir)/src/backend/common.mk
*** ./src/backend/utils/mmgr/simple_alloc.c.orig 2010-03-18 15:14:59.593793406 +0100
--- ./src/backend/utils/mmgr/simple_alloc.c 2010-03-18 11:10:26.000000000 +0100
***************
*** 0 ****
--- 1,348 ----
+ #include "postgres.h"
+
+ #include "utils/memutils.h"
+
+ typedef struct AllocBlockData *AllocBlock; /* forward reference */
+
+
+ typedef struct SimpleAllocContextData
+ {
+ MemoryContextData header; /* Standard memory-context fields */
+ AllocBlock blocks; /* head of list of standard blocks */
+ AllocBlock bigblocks; /* head of list of blocks larger than initBlockSize */
+ bool isReset;
+ /* Allocation parameters for this context */
+ Size initBlockSize;
+ void *(*external_alloc) (Size size); /* allocate memory */
+ void (*external_free) (void *); /* deallocate memory */
+ } SimpleAllocContextData;
+
+ typedef SimpleAllocContextData *SimpleAlloc;
+
+ typedef struct AllocBlockData
+ {
+ AllocBlock next;
+ Size freesize;
+ char *freeptr;
+ char data[1];
+ } AllocBlockData;
+
+ #define SimpleAllocCtxIsValid(ctx) PointerIsValid(ctx)
+
+ #define ALLOC_BLOCKHDRSZ MAXALIGN(sizeof(AllocBlockData))
+
+ static void SimpleAllocInit(MemoryContext context);
+ static void SimpleAllocReset(MemoryContext context);
+ static void SimpleAllocDelete(MemoryContext context);
+ static void *SimpleAllocAlloc(MemoryContext context, Size size);
+ static void SimpleAllocStats(MemoryContext context, int level);
+ static void *SimpleAllocRealloc(MemoryContext context, void *pointer, Size size);
+ static void SimpleAllocFree(MemoryContext context, void *pointer);
+ static bool SimpleAllocIsEmpty(MemoryContext context);
+ static Size SimpleAllocGetChunkSpace(MemoryContext context, void *pointer);
+
+ #ifdef MEMORY_CONTEXT_CHECKING
+ static void SimpleAllocCheck(MemoryContext context);
+ #endif
+
+ /*
+ * This is the virtual function table for SimpleAlloc context
+ */
+ static MemoryContextMethods SimpleAllocMethods = {
+ SimpleAllocAlloc,
+ SimpleAllocFree, /* not supported */
+ SimpleAllocRealloc, /* not supported */
+ SimpleAllocInit,
+ SimpleAllocReset,
+ SimpleAllocDelete,
+ SimpleAllocGetChunkSpace, /* not supported */
+ SimpleAllocIsEmpty,
+ SimpleAllocStats
+ #ifdef MEMORY_CONTEXT_CHECKING
+ , SimpleAllocCheck
+ #endif
+ };
+
+ #ifdef RANDOMIZE_ALLOCATED_MEMORY
+
+ /*
+ * Fill a just-allocated piece of memory with "random" data. It's not really
+ * very random, just a repeating sequence with a length that's prime. What
+ * we mainly want out of it is to have a good probability that two palloc's
+ * of the same number of bytes start out containing different data.
+ */
+ static void
+ randomize_mem(char *ptr, size_t size)
+ {
+ static int save_ctr = 1;
+ int ctr;
+
+ ctr = save_ctr;
+ while (size-- > 0)
+ {
+ *ptr++ = ctr;
+ if (++ctr > 251)
+ ctr = 1;
+ }
+ save_ctr = ctr;
+ }
+ #endif /* RANDOMIZE_ALLOCATED_MEMORY */
+
+ /*
+ * SimpleAllocContextCreate
+ *
+ *
+ */
+ MemoryContext
+ SimpleAllocContextCreate(MemoryContext parent,
+ const char *name,
+ Size initBlockSize,
+ void *(*external_alloc) (Size size),
+ void (*external_free) (void *))
+ {
+ SimpleAlloc context;
+
+ context = (SimpleAlloc) MemoryContextCreate(T_SimpleAllocContext,
+ sizeof(SimpleAllocContextData),
+ &SimpleAllocMethods,
+ parent,
+ name);
+ context->initBlockSize = MAXALIGN(initBlockSize);
+ context->external_alloc = external_alloc;
+ context->external_free = external_free;
+
+ context->blocks = NULL;
+ context->bigblocks = NULL;
+ context->isReset = true;
+
+ return (MemoryContext) context;
+ }
+
+ /*
+ * Context type methods
+ */
+ static void
+ SimpleAllocInit(MemoryContext context)
+ {
+ /*
+ * do nothing
+ */
+ }
+
+ static void
+ SimpleAllocReset(MemoryContext context)
+ {
+ AllocBlock block;
+ AllocBlock next;
+ SimpleAlloc simple_ctx = (SimpleAlloc) context;
+
+ Assert(SimpleAllocCtxIsValid(simple_ctx));
+
+ /*
+ * because simple alloc context doesn't support operation free
+ * we don't need to implement keeper block.
+ */
+ block = simple_ctx->blocks;
+ while (block != NULL)
+ {
+ next = block->next;
+ #ifdef CLOBBER_FREED_MEMORY
+ memset(block->data, 0x7F, simple_ctx->initBlockSize);
+ #endif
+
+ if (simple_ctx->external_free != NULL)
+ simple_ctx->external_free(block);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot release allocated memory")));
+ block = next;
+ }
+
+ block = simple_ctx->bigblocks;
+ while (block != NULL)
+ {
+ next = block->next;
+
+ #ifdef CLOBBER_FREED_MEMORY
+ memset(block->data, 0x7F, simple_ctx->initBlockSize);
+ #endif
+
+ if (simple_ctx->external_free != NULL)
+ simple_ctx->external_free(block);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot release allocated memory")));
+ block = next;
+ }
+
+ simple_ctx->isReset = true;
+ simple_ctx->blocks = NULL;
+ simple_ctx->bigblocks = NULL;
+ }
+
+ static void
+ SimpleAllocDelete(MemoryContext context)
+ {
+ SimpleAllocReset(context);
+ }
+
+ static void *
+ SimpleAllocAlloc(MemoryContext context, Size size)
+ {
+ SimpleAlloc simple_ctx = (SimpleAlloc) context;
+ AllocBlock block;
+ void *newPointer;
+ Size allocSize;
+ bool isBigBlock;
+
+ Assert(SimpleAllocCtxIsValid(simple_ctx));
+
+ /*
+ * Allocate separate block for allocation larger than 1/2 of initBlockSize
+ */
+ size = MAXALIGN(size);
+ if (size > (simple_ctx->initBlockSize/3*2))
+ {
+ /*
+ * we will allocate a big block - block only for current
+ * allocation - these blocks are is separete list.
+ */
+ allocSize = size;
+ isBigBlock = true;
+ }
+ else
+ {
+ allocSize = simple_ctx->initBlockSize;
+ isBigBlock = false;
+ }
+
+ /*
+ * Allocate a new block if there are no allocated or is big
+ * or isn't enought space.
+ */
+ if (simple_ctx->blocks == NULL || isBigBlock ||
+ (!isBigBlock && simple_ctx->blocks && simple_ctx->blocks->freesize < size))
+ {
+ /* alloc a new block */
+
+ block = simple_ctx->external_alloc(MAXALIGN(allocSize + sizeof(AllocBlockData) - 1));
+ if (block == NULL)
+ {
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory"),
+ errdetail("Failed on request of size %lu.",
+ (unsigned long) allocSize + sizeof(AllocBlockData) - 1)));
+ }
+
+ block->freesize = allocSize;
+ block->freeptr = block->data;
+ if (!isBigBlock)
+ {
+ block->next = simple_ctx->blocks;
+ simple_ctx->blocks = block;
+ }
+ else
+ {
+ block->next = simple_ctx->bigblocks;
+ simple_ctx->bigblocks = block;
+ }
+ }
+ else
+ block = simple_ctx->blocks;
+
+ Assert(size <= block->freesize);
+ newPointer = block->freeptr;
+
+ if (!isBigBlock)
+ {
+ block->freeptr += size;
+ block->freesize -= size;
+ }
+
+ #ifdef RANDOMIZE_ALLOCATED_MEMORY
+ randomize_mem((char *) newPointer, size);
+ #endif
+ simple_ctx->isReset = false;
+
+ return newPointer;
+ }
+
+ /*
+ * SimpleAllocStats
+ */
+ static void
+ SimpleAllocStats(MemoryContext context, int level)
+ {
+ SimpleAlloc simple_ctx = (SimpleAlloc) context;
+ AllocBlock block;
+ long nblocks = 0;
+ long totalspace = 0;
+ long freespace = 0;
+ int i;
+
+ for (block = simple_ctx->blocks; block != NULL; block = block->next)
+ {
+ nblocks++;
+ totalspace += MAXALIGN(simple_ctx->initBlockSize + sizeof(AllocBlockData) - 1);
+ freespace += block->freesize;
+ }
+
+ for (block = simple_ctx->bigblocks; block != NULL; block = block->next)
+ {
+ nblocks++;
+ totalspace += block->freesize + sizeof(AllocBlockData) - 1;
+ }
+
+ for (i = 0; i < level; i++)
+ fprintf(stderr, " ");
+
+ fprintf(stderr,
+ "%s: %lu total in %ld blocks; %lu free; %lu used\n",
+ simple_ctx->header.name, totalspace, nblocks, freespace,
+ totalspace - freespace);
+ }
+
+ static void *
+ SimpleAllocRealloc(MemoryContext context, void *pointer, Size size)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("simple allocator can't realloc memory")));
+ return NULL; /* be compiler quite */
+ }
+
+ static void
+ SimpleAllocFree(MemoryContext context, void *pointer)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("simple allocator can't release memory")));
+ }
+
+ static bool
+ SimpleAllocIsEmpty(MemoryContext context)
+ {
+ return ((SimpleAlloc) context)->isReset;
+ }
+
+ static Size
+ SimpleAllocGetChunkSpace(MemoryContext context, void *pointer)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("simple allocator has not chunks")));
+ return 0; /* be compile quite */
+ }
+
+ #ifdef MEMORY_CONTEXT_CHECKING
+ static void
+ SimpleAllocCheck(MemoryContext context)
+ {
+ /* do nothing */
+ }
+
+ #endif
*** ./src/include/nodes/memnodes.h.orig 2010-01-02 17:58:04.000000000 +0100
--- ./src/include/nodes/memnodes.h 2010-03-18 15:18:55.349792968 +0100
***************
*** 72,77 ****
*/
#define MemoryContextIsValid(context) \
((context) != NULL && \
! (IsA((context), AllocSetContext)))
#endif /* MEMNODES_H */
--- 72,77 ----
*/
#define MemoryContextIsValid(context) \
((context) != NULL && \
! (IsA((context), AllocSetContext) || IsA((context), SimpleAllocContext)))
#endif /* MEMNODES_H */
*** ./src/include/nodes/nodes.h.orig 2010-03-18 15:21:26.328793052 +0100
--- ./src/include/nodes/nodes.h 2010-03-18 15:21:37.257794503 +0100
***************
*** 234,239 ****
--- 234,240 ----
*/
T_MemoryContext = 600,
T_AllocSetContext,
+ T_SimpleAllocContext,
/*
* TAGS FOR VALUE NODES (value.h)
*** ./src/include/tsearch/dicts/regis.h.orig 2010-03-18 16:40:41.584584472 +0100
--- ./src/include/tsearch/dicts/regis.h 2010-03-18 16:40:13.428708569 +0100
***************
*** 40,46 ****
bool RS_isRegis(const char *str);
! void RS_compile(Regis *r, bool issuffix, const char *str);
void RS_free(Regis *r);
/*returns true if matches */
--- 40,46 ----
bool RS_isRegis(const char *str);
! void RS_compile(Regis *r, bool issuffix, const char *str, MemoryContext simple_ctx);
void RS_free(Regis *r);
/*returns true if matches */
*** ./src/include/utils/memutils.h.orig 2010-01-02 17:58:10.000000000 +0100
--- ./src/include/utils/memutils.h 2010-03-18 15:20:48.889794704 +0100
***************
*** 136,139 ****
--- 136,148 ----
#define ALLOCSET_SMALL_INITSIZE (1 * 1024)
#define ALLOCSET_SMALL_MAXSIZE (8 * 1024)
+ /* simple_alloc.c */
+ extern MemoryContext SimpleAllocContextCreate(MemoryContext parent,
+ const char *name,
+ Size initBlockSize,
+ void *(*external_alloc) (Size size),
+ void (*external_free) (void *));
+
+ #define SIMPLEALLOC_LARGE_INITSIZE (1024 * 128)
+
#endif /* MEMUTILS_H */
Pavel Stehule <pavel.stehule@gmail.com> wrote:
I wrote some small patch, that allow preloading of selected ispell
dictionary. It solve the problem with slow tsearch initialisation with
some language configuration.I afraid so this module doesn't help on MS Windows.
I think it should work on all platforms if we include it into the core.
We should continue to research shared memory or mmap approaches.
The fundamental issue seems to be in the slow initialization of
dictionaries. If so, how about adding a pre-complile tool to convert
a dictionary into a binary file, and each backend simply mmap it?
BTW, SimpleAllocContextCreate() is not used at all in the patch.
Do you still need it?
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
Takahiro Itagaki wrote:
Pavel Stehule <pavel.stehule@gmail.com> wrote:
I wrote some small patch, that allow preloading of selected ispell
dictionary. It solve the problem with slow tsearch initialisation with
some language configuration.I afraid so this module doesn't help on MS Windows.
I think it should work on all platforms if we include it into the core.
It will work, as in it will compile and run. It just won't be any
faster. I think that's enough, otherwise you could argue that we
shouldn't have preload_shared_libraries option at all because it won't
help on Windows.
The fundamental issue seems to be in the slow initialization of
dictionaries. If so, how about adding a pre-complile tool to convert
a dictionary into a binary file, and each backend simply mmap it?
Yeah, that would be better.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
2010/3/23 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
Pavel Stehule <pavel.stehule@gmail.com> wrote:
I wrote some small patch, that allow preloading of selected ispell
dictionary. It solve the problem with slow tsearch initialisation with
some language configuration.I afraid so this module doesn't help on MS Windows.
I think it should work on all platforms if we include it into the core.
We should continue to research shared memory or mmap approaches.The fundamental issue seems to be in the slow initialization of
dictionaries. If so, how about adding a pre-complile tool to convert
a dictionary into a binary file, and each backend simply mmap it?
It means loading about 25MB from disc. for every first tsearch query -
sorry, I don't believe can be good.
BTW, SimpleAllocContextCreate() is not used at all in the patch.
Do you still need it?
yes - I needed it. Without Simple Allocator cz configuration takes
48MB. There are a few parts has to be supported by Simple Allocator -
other hasn't significant impact - so I don't ugly more code. In my
first path I verify so dictionary data are read only so I was
motivated to use Simple Allocator everywhere. It is not necessary for
preload method.
Pavel
Show quoted text
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
2010/3/23 Pavel Stehule <pavel.stehule@gmail.com>:
2010/3/23 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
The fundamental issue seems to be in the slow initialization of
dictionaries. If so, how about adding a pre-complile tool to convert
a dictionary into a binary file, and each backend simply mmap it?It means loading about 25MB from disc. for every first tsearch query -
sorry, I don't believe can be good.
The operating system's VM subsystem should make that a non-problem.
"Loading" is also not the word I would use to indicate what mmap does.
Nicolas
2010/3/23 Nicolas Barbier <nicolas.barbier@gmail.com>:
2010/3/23 Pavel Stehule <pavel.stehule@gmail.com>:
2010/3/23 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
The fundamental issue seems to be in the slow initialization of
dictionaries. If so, how about adding a pre-complile tool to convert
a dictionary into a binary file, and each backend simply mmap it?It means loading about 25MB from disc. for every first tsearch query -
sorry, I don't believe can be good.The operating system's VM subsystem should make that a non-problem.
"Loading" is also not the word I would use to indicate what mmap does.
Maybe we can do some manipulation inside memory - I have not any
knowledges about mmap. With Simple Allocator we can have a dictionary
data as one block. Problems are a pointers, but I believe so can be
replaced by offsets.
Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary. And still you
need a next application for loading.
p.s. I able to serialise czech dictionary, because it use only simply regexp.
Pavel
Show quoted text
Nicolas
Pavel Stehule wrote:
Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary.
That's the sort of thing that can be done when first required by any
backend and the results saved in a file for other backends to mmap().
It'd probably want to be opened r/w access-exclusive initially, then
re-opened read-only access-shared when ready for use.
My only concern would be that the cache would want to be forcibly
cleared at postmaster start, so that "restart the postmaster" fixes any
messsed-up-cache issues that might arise (not that they should) without
people having to go rm'ing in the datadir. Even if Pg never has any bugs
that result in bad cache files, the file system / bad memory / cosmic
rays / etc can still mangle a cache file.
BTW, mmap() isn't an issue on Windows:
http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
It's spelled CreateFileMapping, but otherwise is fairly similar, and is
perfect for this sort of use.
A shared read-only mapping of processed-and-cached tsearch2 dictionaries
would save a HUGE amount of memory if many backends were using tsearch2
at the same time. I'd make a big difference here.
--
Craig Ringer
2010/3/24 Craig Ringer <craig@postnewspapers.com.au>:
Pavel Stehule wrote:
Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary.That's the sort of thing that can be done when first required by any
backend and the results saved in a file for other backends to mmap().
It'd probably want to be opened r/w access-exclusive initially, then
re-opened read-only access-shared when ready for use.My only concern would be that the cache would want to be forcibly
cleared at postmaster start, so that "restart the postmaster" fixes any
messsed-up-cache issues that might arise (not that they should) without
people having to go rm'ing in the datadir. Even if Pg never has any bugs
that result in bad cache files, the file system / bad memory / cosmic
rays / etc can still mangle a cache file.BTW, mmap() isn't an issue on Windows:
http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
It's spelled CreateFileMapping, but otherwise is fairly similar, and is
perfect for this sort of use.A shared read-only mapping of processed-and-cached tsearch2 dictionaries
would save a HUGE amount of memory if many backends were using tsearch2
at the same time. I'd make a big difference here.
If you know this area well, please, enhance my first patch. I am not
able to oppose to Tom, who has a clean opinion on this patch :(
Pavel
Show quoted text
--
Craig Ringer
Pavel Stehule wrote:
2010/3/24 Craig Ringer <craig@postnewspapers.com.au>:
Pavel Stehule wrote:
Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary.That's the sort of thing that can be done when first required by any
backend and the results saved in a file for other backends to mmap().
It'd probably want to be opened r/w access-exclusive initially, then
re-opened read-only access-shared when ready for use.My only concern would be that the cache would want to be forcibly
cleared at postmaster start, so that "restart the postmaster" fixes any
messsed-up-cache issues that might arise (not that they should) without
people having to go rm'ing in the datadir. Even if Pg never has any bugs
that result in bad cache files, the file system / bad memory / cosmic
rays / etc can still mangle a cache file.BTW, mmap() isn't an issue on Windows:
?http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
It's spelled CreateFileMapping, but otherwise is fairly similar, and is
perfect for this sort of use.A shared read-only mapping of processed-and-cached tsearch2 dictionaries
would save a HUGE amount of memory if many backends were using tsearch2
at the same time. I'd make a big difference here.If you know this area well, please, enhance my first patch. I am not
able to oppose to Tom, who has a clean opinion on this patch :(
Should we add a TODO?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
2010/3/24 Bruce Momjian <bruce@momjian.us>:
Pavel Stehule wrote:
2010/3/24 Craig Ringer <craig@postnewspapers.com.au>:
Pavel Stehule wrote:
Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary.That's the sort of thing that can be done when first required by any
backend and the results saved in a file for other backends to mmap().
It'd probably want to be opened r/w access-exclusive initially, then
re-opened read-only access-shared when ready for use.My only concern would be that the cache would want to be forcibly
cleared at postmaster start, so that "restart the postmaster" fixes any
messsed-up-cache issues that might arise (not that they should) without
people having to go rm'ing in the datadir. Even if Pg never has any bugs
that result in bad cache files, the file system / bad memory / cosmic
rays / etc can still mangle a cache file.BTW, mmap() isn't an issue on Windows:
?http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
It's spelled CreateFileMapping, but otherwise is fairly similar, and is
perfect for this sort of use.A shared read-only mapping of processed-and-cached tsearch2 dictionaries
would save a HUGE amount of memory if many backends were using tsearch2
at the same time. I'd make a big difference here.If you know this area well, please, enhance my first patch. I am not
able to oppose to Tom, who has a clean opinion on this patch :(Should we add a TODO?
why not ?
Pavel
Show quoted text
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.comPG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Pavel Stehule wrote:
2010/3/24 Bruce Momjian <bruce@momjian.us>:
Pavel Stehule wrote:
2010/3/24 Craig Ringer <craig@postnewspapers.com.au>:
Pavel Stehule wrote:
Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary.That's the sort of thing that can be done when first required by any
backend and the results saved in a file for other backends to mmap().
It'd probably want to be opened r/w access-exclusive initially, then
re-opened read-only access-shared when ready for use.My only concern would be that the cache would want to be forcibly
cleared at postmaster start, so that "restart the postmaster" fixes any
messsed-up-cache issues that might arise (not that they should) without
people having to go rm'ing in the datadir. Even if Pg never has any bugs
that result in bad cache files, the file system / bad memory / cosmic
rays / etc can still mangle a cache file.BTW, mmap() isn't an issue on Windows:
?http://msdn.microsoft.com/en-us/library/aa366556%28VS.85%29.aspx
It's spelled CreateFileMapping, but otherwise is fairly similar, and is
perfect for this sort of use.A shared read-only mapping of processed-and-cached tsearch2 dictionaries
would save a HUGE amount of memory if many backends were using tsearch2
at the same time. I'd make a big difference here.If you know this area well, please, enhance my first patch. I am not
able to oppose to Tom, who has a clean opinion on this patch :(Should we add a TODO?
why not ?
OK, what would the TODO text be?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Bruce Momjian <bruce@momjian.us> writes:
OK, what would the TODO text be?
I think there are really two tasks here:
* preprocess the textual dictionary definition files into something
that can be slurped directly into memory;
* use mmap() instead of read() to read preprocessed files into memory,
on machines where such a syscall is available.
There would be considerable gain from task #1 even without mmap.
regards, tom lane