tsearch_core patch: permissions and security issues
I've been looking at the tsearch patch a bit, and I think there needs to
be more thought given to the permissions required to mess around with
tsearch configuration objects.
The TSParser objects reference functions declared to take and return
INTERNAL arguments. This means that the underlying functions must be
coded in C and can only be installed by a superuser, which in turn means
that there is no scenario where it is really useful for a non-superuser
to execute CREATE PARSER. What's more, allowing a non-superuser to do
it creates security holes: if you can find an unrelated function taking
the right number of INTERNAL arguments, you can install it as a TSParser
support function. That trivially allows crashing the backend, and it
could allow worse security holes than that.
TSDictionary objects have exactly the same issues since they also depend
on functions with INTERNAL arguments.
At minimum this means that we should restrict CREATE/DROP/ALTER commands
for these objects to superusers. (Which in turn means there's no point
in tracking an ownership column for them; every superuser is the same as
every other one, permissions-wise.) I'm wondering though whether this
doesn't mean that we don't need manipulation commands for them at all.
Is it likely that people will be adding parser or dictionary support to
an installation on the fly? Maybe we can just create 'em all at initdb
time and be done, similar to the way index access methods are treated.
This doesn't say that it's not possible to add more; you can add an
index access method on the fly too, if you want, by inserting stuff into
pg_am by hand. I'm just wondering whether all that SQL-statement
support and pg_dump support for custom parsers and dictionaries is
really worth the code space and future maintenance effort it'll eat up.
You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe. I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do. If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.
TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries. But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data. Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on? How can we even recognize whether there is
stored data that will be affected by a configuration change? (AFAICS
the patch doesn't put anything into the pg_depend machinery that could
deal with this.) And who gets to decide which configuration is default,
anyway?
I'm also a bit disturbed that you've made searches for TSConfiguration
objects be search-path-sensitive. That is likely to create problems
similar to those we've recently recognized for function lookup, eg,
an insertion into a full-text-indexed column gets treated differently
depending on the caller's search path. It's particularly bad to have
the default object be search-path-dependent. We learned the hard way
not to do that for default index operator classes; let's not make the
same mistake again for tsearch configurations.
Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept. Seems like the mapping from
token types to dictionaries is really a property of a configuration,
and we ought to be handling it through options of CREATE/ALTER
CONFIGURATION commands, not as an apparently independent object type.
The way the patch is doing it feels like implementing CREATE ATTRIBUTE
as a separate command instead of having ALTER TABLE ADD COLUMN; it's
just weird, and it's not obvious that dropping a configuration should
make the associated mapping object go away.
Lastly, I'm unhappy that the patch still keeps a lot of configuration
information, such as stop word lists, in the filesystem rather than the
database. It seems to me that the single easiest and most useful part
of a configuration to change is the stop word list; but this setup
guarantees that no one but a DBA can do that, and what's more that
pg_dump won't record your changes. What's the point of having any
non-superuser configuration capability at all, if stop words aren't part
of what you can change?
regards, tom lane
You bring up a very good point. There are fifteen new commands being
added for full text indexing:
alter-fulltext-config.sgml alter-fulltext-owner.sgml
create-fulltext-dict.sgml drop-fulltext-dict.sgml
alter-fulltext-dict.sgml alter-fulltext-parser.sgml
create-fulltext-map.sgml drop-fulltext-map.sgml
alter-fulltext-dictset.sgml comment-fulltext.sgml
create-fulltext-parser.sgml drop-fulltext-parser.sgml
alter-fulltext-map.sgml create-fulltext-config.sgml
drop-fulltext-config.sgml
I think encoding is a good example to follow. We allow users to create
new conversions (CREATE CONVERSION), but we don't allow them to create
new encodings --- those are hard-coded in the backend. Which of the
following full text objects:
config
dict
map
dictset
parser
can we hard-code into the backend, and just update for every major
release like we do for encodings?
---------------------------------------------------------------------------
Tom Lane wrote:
I've been looking at the tsearch patch a bit, and I think there needs to
be more thought given to the permissions required to mess around with
tsearch configuration objects.The TSParser objects reference functions declared to take and return
INTERNAL arguments. This means that the underlying functions must be
coded in C and can only be installed by a superuser, which in turn means
that there is no scenario where it is really useful for a non-superuser
to execute CREATE PARSER. What's more, allowing a non-superuser to do
it creates security holes: if you can find an unrelated function taking
the right number of INTERNAL arguments, you can install it as a TSParser
support function. That trivially allows crashing the backend, and it
could allow worse security holes than that.TSDictionary objects have exactly the same issues since they also depend
on functions with INTERNAL arguments.At minimum this means that we should restrict CREATE/DROP/ALTER commands
for these objects to superusers. (Which in turn means there's no point
in tracking an ownership column for them; every superuser is the same as
every other one, permissions-wise.) I'm wondering though whether this
doesn't mean that we don't need manipulation commands for them at all.
Is it likely that people will be adding parser or dictionary support to
an installation on the fly? Maybe we can just create 'em all at initdb
time and be done, similar to the way index access methods are treated.
This doesn't say that it's not possible to add more; you can add an
index access method on the fly too, if you want, by inserting stuff into
pg_am by hand. I'm just wondering whether all that SQL-statement
support and pg_dump support for custom parsers and dictionaries is
really worth the code space and future maintenance effort it'll eat up.You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe. I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do. If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries. But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data. Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on? How can we even recognize whether there is
stored data that will be affected by a configuration change? (AFAICS
the patch doesn't put anything into the pg_depend machinery that could
deal with this.) And who gets to decide which configuration is default,
anyway?I'm also a bit disturbed that you've made searches for TSConfiguration
objects be search-path-sensitive. That is likely to create problems
similar to those we've recently recognized for function lookup, eg,
an insertion into a full-text-indexed column gets treated differently
depending on the caller's search path. It's particularly bad to have
the default object be search-path-dependent. We learned the hard way
not to do that for default index operator classes; let's not make the
same mistake again for tsearch configurations.Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept. Seems like the mapping from
token types to dictionaries is really a property of a configuration,
and we ought to be handling it through options of CREATE/ALTER
CONFIGURATION commands, not as an apparently independent object type.
The way the patch is doing it feels like implementing CREATE ATTRIBUTE
as a separate command instead of having ALTER TABLE ADD COLUMN; it's
just weird, and it's not obvious that dropping a configuration should
make the associated mapping object go away.Lastly, I'm unhappy that the patch still keeps a lot of configuration
information, such as stop word lists, in the filesystem rather than the
database. It seems to me that the single easiest and most useful part
of a configuration to change is the stop word list; but this setup
guarantees that no one but a DBA can do that, and what's more that
pg_dump won't record your changes. What's the point of having any
non-superuser configuration capability at all, if stop words aren't part
of what you can change?regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe. I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do. If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.
Well assuming we have any SQL-level support at all I think we should strive to
avoid these functions taking INTERNAL arguments.
I feel like having them in the GIST interface has been a major impediment to
more people defining GIST indexes for more datatypes. Because you need to
write C code dealing with internal data structures to handle page splits the
bar to implement GIST index operator classes is too high for most users. So
instead of a simple SQL command we end up with contrib modules implementing
each type of GIST index.
A while back I proposed that we implement the same page-split algorithm that
most (or all?) of those contrib modules copy-paste between them as a default
implementation. That would allow defining a GIST index in terms of a handful
of operators like "distance" which could be defined with a type-safe api. This
would be less flexible than the existing generic solution but it would allow
defining new GIST indexes without writing C code.
But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.
ouch. could mucking with a configuration create a corrupt index?
This sounds sort of analogous to the issues collation bring up.
It seems to me that the single easiest and most useful part of a
configuration to change is the stop word list; but this setup guarantees
that no one but a DBA can do that, and what's more that pg_dump won't record
your changes.
I would second that, in the past I was expected to provide an administrative
web interface to adjust the list of stop words.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes:
Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.
I don't think I want to get into redesigning the patch at that level of
detail, at least not for 8.3. It seems like something possibly worth
thinking about for 8.4 though. The idea that we might want to change
the API for parser and dictionary support routines seems like another
good argument for not exposing user-level facilities for creating them
right now.
What I'm realizing as I look at it is that this is an enormous patch,
and it's not as close to being ready to apply as I had supposed. If we
don't scale it back, then either it doesn't get into 8.3 or 8.3 gets
delayed a whole lot longer. So we need to look at what we can trim or
postpone for a later release.
So all these factors seem to me to point in the same direction: at least
for the time being, we should treat TS parsers and dictionaries the way
we treat index access methods. There'll be a catalog, which the
adventurous can insert new entries into, but no SQL-level support for
doing it, hence no pg_dump support. And we reserve the right to whack
around the API for the functions referenced by the catalog entries.
That still leaves us with the question of SQL-level support for TS
configurations, which are built on top of parsers and dictionaries.
We definitely need some level of capability for that. For the
permissions and dependencies issues, the minimalistic approach is to
say "only superusers can create or alter TS configurations, and if you
alter one it's your responsibility to fix up any dependent tsvector
columns or indexes." We currently handle index operator classes that
way, so it's not completely ridiculous. Sure it would be nice to do
better, but maybe that's a post-8.3 project.
That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.
regards, tom lane
Tom Lane wrote:
Gregory Stark <stark@enterprisedb.com> writes:
Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.
That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.
O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?
Sincerely,
Joshua D. Drake
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
"Joshua D. Drake" <jd@commandprompt.com> writes:
O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?
Well, if you want to take a hard-nosed approach, no form of the patch
would gain us anything over leaving it in contrib, at least not from a
functionality standpoint. The argument in favor has always been about
perception, really: if it's a "core" feature not an "add-on", then
people will take it more seriously. And there's a rather weak
ease-of-use argument that you don't have to install a contrib module.
(The idea that it's targeted at people who can't or won't install a
contrib module is another reason why I think we can skip user-defined
parsers and dictionaries ...)
regards, tom lane
Joshua D. Drake wrote:
Tom Lane wrote:
Gregory Stark <stark@enterprisedb.com> writes:
Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?
The idea is that common operations like searching and mapping dictionaries
will be easier to do, but the more complex stuff will require catalog
manipulations.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote:
"Joshua D. Drake" <jd@commandprompt.com> writes:
O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?Well, if you want to take a hard-nosed approach, no form of the patch
would gain us anything over leaving it in contrib, at least not from a
functionality standpoint. The argument in favor has always been about
perception, really: if it's a "core" feature not an "add-on", then
people will take it more seriously. And there's a rather weak
ease-of-use argument that you don't have to install a contrib module.
(The idea that it's targeted at people who can't or won't install a
contrib module is another reason why I think we can skip user-defined
parsers and dictionaries ...)
Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(
Removal of pg_dump support kind of hurts us, as we already have problems
with pg_dump support and tsearch2. Adding work to have to re-assign
permissions to vector columns because we make changes...
I would grant that having the SQL extensions would certainly be nice.
Anyway, I am not trying to stop the progress. I would like to see
Tsearch2 in core but I also don't want to add complexity. You did say here:
And we reserve the right to whack around the API for the functions
referenced by the catalog entries.
Which kind of gets us back to upgrade problems doesn't it?
Sincerely,
Joshua D. Drake
regards, tom lane
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
"Joshua D. Drake" <jd@commandprompt.com> writes:
Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(
Then you misread it. What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).
As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.
regards, tom lane
Tom Lane wrote:
"Joshua D. Drake" <jd@commandprompt.com> writes:
Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(Then you misread it. What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.
O.k. :)
Joshua D. Drake
regards, tom lane
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
I an attempt to communicate what full text search does, and what
features we are thinking of adding/removing, I have put up the
introduction in HTML:
http://momjian.us/expire/fulltext/HTML/fulltext-intro.html
The links to the other sections don't work yet.
---------------------------------------------------------------------------
Tom Lane wrote:
"Joshua D. Drake" <jd@commandprompt.com> writes:
Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(Then you misread it. What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.
Possible solution is to split pg_ts_dict (I'll talk about dictionaries, but the
same way is possible to parsers, but now it's looked as overdesign) to two table
like pg_am and pg_opclass.
First table, pg_ts_dict_template (I don't know the exact name yet) which
contains columns: oid, template_name, dict_init, dict_lexize and second:
pg_ts_dict with colimns: oid, template_oid, owner, schema, dict_initoption.
CREATE/ALTER/DROP DICTIONARY affects only second table and access to first one
is only select/update/insert/delete similar to pg_am.
IMHO, this interface solves problems with security and dumping.
The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.
Next, INTERNAL arguments parser's and dictionary's APIs are used because if
performance reason. During creation of tsvector from text, there are a lot of
calls of parsers and dictionaries. And internal structures of they states may be
rather complex and cannot be matched in any pgsql's type, even in flat memory
structure.
Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept.
ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
REPLACE olddictname TO newdictname;
ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS] FOR tokentypename;
Is it looking reasonable?
TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries. But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data.Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on? How can we even recognize whether there is
stored data that will be affected by a configuration change? (AFAICS
Very complex task: experienced users could use several configuration
simultaneously. For example: indexing use configuration which doesn't reject
stop-words, but for default searching use configuration which rejects
stop-words. BTW, the same effects may be produced by dictionary's change.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
can we hard-code into the backend, and just update for every major
release like we do for encodings?
Sorry, no one of them :(. We know projects which introduce new parser, new
dictionary. Config and map are changes very often.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.ouch. could mucking with a configuration create a corrupt index?
Depending on what you mean 'corrupted'. It will not corrupted as non-readable
or cause backend crash. But usage of such tsvector column could be limited - not
all words will be searchable.
This sounds sort of analogous to the issues collation bring up.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes:
The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.
Yeah. I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all? It would mean that
dict_init functions would have to guard themselves against invalid
options, but they probably ought to do that anyway. If we did that,
I think we could have a fixed set of dictionaries without too much
problem, and focus on just configurations as being user-alterable.
Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept.
ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
REPLACE olddictname TO newdictname;
ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS] FOR tokentypename;
Is it looking reasonable?
Er ... what's the difference between the second and third forms?
regards, tom lane
"Teodor Sigaev" <teodor@sigaev.ru> writes:
But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.ouch. could mucking with a configuration create a corrupt index?
Depending on what you mean 'corrupted'. It will not corrupted as non-readable
or cause backend crash. But usage of such tsvector column could be limited -
not all words will be searchable.
Am I correct to think of this like changing collations leaving your btree
index "corrupt"? In that case it probably won't cause any backend crash either
but you will get incorrect results. For example, returning different results
depending on whether the index or a full table scan is used.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.Yeah. I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all? It would mean that
It should be. Instances of ispell (and synonym, thesaurus) dictionaries are
different only in dict_initoption part, so it will be only one entry in
pg_ts_dict_template and several ones in pg_ts_dict.
ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
sets dictionary's list for token's type(s)
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
REPLACE olddictname TO newdictname;
Replace dictionary to another dictionary in dictionary's list for token's
type(s). This command is very useful for tweaking configuration and for creating
new configuration which differs from already existing one only by pair of
dictionary.
ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS] FOR tokentypename;
Is it looking reasonable?Er ... what's the difference between the second and third forms?
That changes are doable for several days. I'd like to make changes together with
replacing of FULLTEXT keyword to TEXT SEARCH as you suggested.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On Thu, 14 Jun 2007, Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.Yeah. I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all? It would mean that
dict_init functions would have to guard themselves against invalid
options, but they probably ought to do that anyway. If we did that,
I think we could have a fixed set of dictionaries without too much
problem, and focus on just configurations as being user-alterable.
currently, all dictionaries we provide are all template dictionaries,
so users could change only parameters.
But, there are reasons to allow users register new templates and in fact we
know people/projects with application-dependent dictionaries.
How they could dump/reload their dictionaries ?
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
But, there are reasons to allow users register new templates and in fact
we know people/projects with application-dependent dictionaries. How
they could dump/reload their dictionaries ?
The same way as pg_am does.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On Thu, 14 Jun 2007, Gregory Stark wrote:
"Teodor Sigaev" <teodor@sigaev.ru> writes:
But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.ouch. could mucking with a configuration create a corrupt index?
Depending on what you mean 'corrupted'. It will not corrupted as non-readable
or cause backend crash. But usage of such tsvector column could be limited -
not all words will be searchable.Am I correct to think of this like changing collations leaving your btree
index "corrupt"? In that case it probably won't cause any backend crash either
but you will get incorrect results. For example, returning different results
depending on whether the index or a full table scan is used.
You're correct. But we can't defend users from all possible errors.
Other side, that we need somehow to help user to identify what fts
configuration was used to produce tsvector. For example, comment on
tsvector column would be useful, but we don't know how to do this
automatically.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83