tsearch_core patch: permissions and security issues

Started by Tom Lanealmost 19 years ago143 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I've been looking at the tsearch patch a bit, and I think there needs to
be more thought given to the permissions required to mess around with
tsearch configuration objects.

The TSParser objects reference functions declared to take and return
INTERNAL arguments. This means that the underlying functions must be
coded in C and can only be installed by a superuser, which in turn means
that there is no scenario where it is really useful for a non-superuser
to execute CREATE PARSER. What's more, allowing a non-superuser to do
it creates security holes: if you can find an unrelated function taking
the right number of INTERNAL arguments, you can install it as a TSParser
support function. That trivially allows crashing the backend, and it
could allow worse security holes than that.

TSDictionary objects have exactly the same issues since they also depend
on functions with INTERNAL arguments.

At minimum this means that we should restrict CREATE/DROP/ALTER commands
for these objects to superusers. (Which in turn means there's no point
in tracking an ownership column for them; every superuser is the same as
every other one, permissions-wise.) I'm wondering though whether this
doesn't mean that we don't need manipulation commands for them at all.
Is it likely that people will be adding parser or dictionary support to
an installation on the fly? Maybe we can just create 'em all at initdb
time and be done, similar to the way index access methods are treated.
This doesn't say that it's not possible to add more; you can add an
index access method on the fly too, if you want, by inserting stuff into
pg_am by hand. I'm just wondering whether all that SQL-statement
support and pg_dump support for custom parsers and dictionaries is
really worth the code space and future maintenance effort it'll eat up.

You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe. I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do. If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.

TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries. But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data. Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on? How can we even recognize whether there is
stored data that will be affected by a configuration change? (AFAICS
the patch doesn't put anything into the pg_depend machinery that could
deal with this.) And who gets to decide which configuration is default,
anyway?

I'm also a bit disturbed that you've made searches for TSConfiguration
objects be search-path-sensitive. That is likely to create problems
similar to those we've recently recognized for function lookup, eg,
an insertion into a full-text-indexed column gets treated differently
depending on the caller's search path. It's particularly bad to have
the default object be search-path-dependent. We learned the hard way
not to do that for default index operator classes; let's not make the
same mistake again for tsearch configurations.

Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept. Seems like the mapping from
token types to dictionaries is really a property of a configuration,
and we ought to be handling it through options of CREATE/ALTER
CONFIGURATION commands, not as an apparently independent object type.
The way the patch is doing it feels like implementing CREATE ATTRIBUTE
as a separate command instead of having ALTER TABLE ADD COLUMN; it's
just weird, and it's not obvious that dropping a configuration should
make the associated mapping object go away.

Lastly, I'm unhappy that the patch still keeps a lot of configuration
information, such as stop word lists, in the filesystem rather than the
database. It seems to me that the single easiest and most useful part
of a configuration to change is the stop word list; but this setup
guarantees that no one but a DBA can do that, and what's more that
pg_dump won't record your changes. What's the point of having any
non-superuser configuration capability at all, if stop words aren't part
of what you can change?

regards, tom lane

#2Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: tsearch_core patch: permissions and security issues

You bring up a very good point. There are fifteen new commands being
added for full text indexing:

alter-fulltext-config.sgml alter-fulltext-owner.sgml
create-fulltext-dict.sgml drop-fulltext-dict.sgml
alter-fulltext-dict.sgml alter-fulltext-parser.sgml
create-fulltext-map.sgml drop-fulltext-map.sgml
alter-fulltext-dictset.sgml comment-fulltext.sgml
create-fulltext-parser.sgml drop-fulltext-parser.sgml
alter-fulltext-map.sgml create-fulltext-config.sgml
drop-fulltext-config.sgml

I think encoding is a good example to follow. We allow users to create
new conversions (CREATE CONVERSION), but we don't allow them to create
new encodings --- those are hard-coded in the backend. Which of the
following full text objects:

config
dict
map
dictset
parser

can we hard-code into the backend, and just update for every major
release like we do for encodings?

---------------------------------------------------------------------------

Tom Lane wrote:

I've been looking at the tsearch patch a bit, and I think there needs to
be more thought given to the permissions required to mess around with
tsearch configuration objects.

The TSParser objects reference functions declared to take and return
INTERNAL arguments. This means that the underlying functions must be
coded in C and can only be installed by a superuser, which in turn means
that there is no scenario where it is really useful for a non-superuser
to execute CREATE PARSER. What's more, allowing a non-superuser to do
it creates security holes: if you can find an unrelated function taking
the right number of INTERNAL arguments, you can install it as a TSParser
support function. That trivially allows crashing the backend, and it
could allow worse security holes than that.

TSDictionary objects have exactly the same issues since they also depend
on functions with INTERNAL arguments.

At minimum this means that we should restrict CREATE/DROP/ALTER commands
for these objects to superusers. (Which in turn means there's no point
in tracking an ownership column for them; every superuser is the same as
every other one, permissions-wise.) I'm wondering though whether this
doesn't mean that we don't need manipulation commands for them at all.
Is it likely that people will be adding parser or dictionary support to
an installation on the fly? Maybe we can just create 'em all at initdb
time and be done, similar to the way index access methods are treated.
This doesn't say that it's not possible to add more; you can add an
index access method on the fly too, if you want, by inserting stuff into
pg_am by hand. I'm just wondering whether all that SQL-statement
support and pg_dump support for custom parsers and dictionaries is
really worth the code space and future maintenance effort it'll eat up.

You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe. I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do. If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.

TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries. But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data. Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on? How can we even recognize whether there is
stored data that will be affected by a configuration change? (AFAICS
the patch doesn't put anything into the pg_depend machinery that could
deal with this.) And who gets to decide which configuration is default,
anyway?

I'm also a bit disturbed that you've made searches for TSConfiguration
objects be search-path-sensitive. That is likely to create problems
similar to those we've recently recognized for function lookup, eg,
an insertion into a full-text-indexed column gets treated differently
depending on the caller's search path. It's particularly bad to have
the default object be search-path-dependent. We learned the hard way
not to do that for default index operator classes; let's not make the
same mistake again for tsearch configurations.

Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept. Seems like the mapping from
token types to dictionaries is really a property of a configuration,
and we ought to be handling it through options of CREATE/ALTER
CONFIGURATION commands, not as an apparently independent object type.
The way the patch is doing it feels like implementing CREATE ATTRIBUTE
as a separate command instead of having ALTER TABLE ADD COLUMN; it's
just weird, and it's not obvious that dropping a configuration should
make the associated mapping object go away.

Lastly, I'm unhappy that the patch still keeps a lot of configuration
information, such as stop word lists, in the filesystem rather than the
database. It seems to me that the single easiest and most useful part
of a configuration to change is the stop word list; but this setup
guarantees that no one but a DBA can do that, and what's more that
pg_dump won't record your changes. What's the point of having any
non-superuser configuration capability at all, if stop words aren't part
of what you can change?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: tsearch_core patch: permissions and security issues

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe. I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do. If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.

Well assuming we have any SQL-level support at all I think we should strive to
avoid these functions taking INTERNAL arguments.

I feel like having them in the GIST interface has been a major impediment to
more people defining GIST indexes for more datatypes. Because you need to
write C code dealing with internal data structures to handle page splits the
bar to implement GIST index operator classes is too high for most users. So
instead of a simple SQL command we end up with contrib modules implementing
each type of GIST index.

A while back I proposed that we implement the same page-split algorithm that
most (or all?) of those contrib modules copy-paste between them as a default
implementation. That would allow defining a GIST index in terms of a handful
of operators like "distance" which could be defined with a type-safe api. This
would be less flexible than the existing generic solution but it would allow
defining new GIST indexes without writing C code.

But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.

ouch. could mucking with a configuration create a corrupt index?

This sounds sort of analogous to the issues collation bring up.

It seems to me that the single easiest and most useful part of a
configuration to change is the stop word list; but this setup guarantees
that no one but a DBA can do that, and what's more that pg_dump won't record
your changes.

I would second that, in the past I was expected to provide an administrative
web interface to adjust the list of stop words.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: tsearch_core patch: permissions and security issues

Gregory Stark <stark@enterprisedb.com> writes:

Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.

I don't think I want to get into redesigning the patch at that level of
detail, at least not for 8.3. It seems like something possibly worth
thinking about for 8.4 though. The idea that we might want to change
the API for parser and dictionary support routines seems like another
good argument for not exposing user-level facilities for creating them
right now.

What I'm realizing as I look at it is that this is an enormous patch,
and it's not as close to being ready to apply as I had supposed. If we
don't scale it back, then either it doesn't get into 8.3 or 8.3 gets
delayed a whole lot longer. So we need to look at what we can trim or
postpone for a later release.

So all these factors seem to me to point in the same direction: at least
for the time being, we should treat TS parsers and dictionaries the way
we treat index access methods. There'll be a catalog, which the
adventurous can insert new entries into, but no SQL-level support for
doing it, hence no pg_dump support. And we reserve the right to whack
around the API for the functions referenced by the catalog entries.

That still leaves us with the question of SQL-level support for TS
configurations, which are built on top of parsers and dictionaries.
We definitely need some level of capability for that. For the
permissions and dependencies issues, the minimalistic approach is to
say "only superusers can create or alter TS configurations, and if you
alter one it's your responsibility to fix up any dependent tsvector
columns or indexes." We currently handle index operator classes that
way, so it's not completely ridiculous. Sure it would be nice to do
better, but maybe that's a post-8.3 project.

That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.

regards, tom lane

#5Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#4)
Re: tsearch_core patch: permissions and security issues

Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.

That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.

O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?

Sincerely,

Joshua D. Drake

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#5)
Re: tsearch_core patch: permissions and security issues

"Joshua D. Drake" <jd@commandprompt.com> writes:

O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?

Well, if you want to take a hard-nosed approach, no form of the patch
would gain us anything over leaving it in contrib, at least not from a
functionality standpoint. The argument in favor has always been about
perception, really: if it's a "core" feature not an "add-on", then
people will take it more seriously. And there's a rather weak
ease-of-use argument that you don't have to install a contrib module.
(The idea that it's targeted at people who can't or won't install a
contrib module is another reason why I think we can skip user-defined
parsers and dictionaries ...)

regards, tom lane

#7Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#5)
Re: tsearch_core patch: permissions and security issues

Joshua D. Drake wrote:

Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.

That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.

O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?

The idea is that common operations like searching and mapping dictionaries
will be easier to do, but the more complex stuff will require catalog
manipulations.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#6)
Re: tsearch_core patch: permissions and security issues

Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

O.k. I am not trying to throw any cold water on this, but with the
limitations we are suggesting, does the patch gain us anything over just
leaving tsearch in contrib?

Well, if you want to take a hard-nosed approach, no form of the patch
would gain us anything over leaving it in contrib, at least not from a
functionality standpoint. The argument in favor has always been about
perception, really: if it's a "core" feature not an "add-on", then
people will take it more seriously. And there's a rather weak
ease-of-use argument that you don't have to install a contrib module.
(The idea that it's targeted at people who can't or won't install a
contrib module is another reason why I think we can skip user-defined
parsers and dictionaries ...)

Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(

Removal of pg_dump support kind of hurts us, as we already have problems
with pg_dump support and tsearch2. Adding work to have to re-assign
permissions to vector columns because we make changes...

I would grant that having the SQL extensions would certainly be nice.

Anyway, I am not trying to stop the progress. I would like to see
Tsearch2 in core but I also don't want to add complexity. You did say here:

And we reserve the right to whack around the API for the functions
referenced by the catalog entries.

Which kind of gets us back to upgrade problems doesn't it?

Sincerely,

Joshua D. Drake

regards, tom lane

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#8)
Re: tsearch_core patch: permissions and security issues

"Joshua D. Drake" <jd@commandprompt.com> writes:

Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(

Then you misread it. What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).

As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.

regards, tom lane

#10Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#9)
Re: tsearch_core patch: permissions and security issues

Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(

Then you misread it. What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).

As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.

O.k. :)

Joshua D. Drake

regards, tom lane

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#9)
Re: tsearch_core patch: permissions and security issues

I an attempt to communicate what full text search does, and what
features we are thinking of adding/removing, I have put up the
introduction in HTML:

http://momjian.us/expire/fulltext/HTML/fulltext-intro.html

The links to the other sections don't work yet.

---------------------------------------------------------------------------

Tom Lane wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

Well my argument has always been the "core" feature argument. Perhaps I
am missing some info here, but when I read what you wrote, I read that
Tsearch will now be "harder" to work with. Not easier. :(

Then you misread it. What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).

As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#4)
Re: tsearch_core patch: permissions and security issues

Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.

That gets us down to just needing to worry about whether we like the
SQL representation of configurations. Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.

Possible solution is to split pg_ts_dict (I'll talk about dictionaries, but the
same way is possible to parsers, but now it's looked as overdesign) to two table
like pg_am and pg_opclass.
First table, pg_ts_dict_template (I don't know the exact name yet) which
contains columns: oid, template_name, dict_init, dict_lexize and second:
pg_ts_dict with colimns: oid, template_oid, owner, schema, dict_initoption.

CREATE/ALTER/DROP DICTIONARY affects only second table and access to first one
is only select/update/insert/delete similar to pg_am.

IMHO, this interface solves problems with security and dumping.

The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.

Next, INTERNAL arguments parser's and dictionary's APIs are used because if
performance reason. During creation of tsvector from text, there are a lot of
calls of parsers and dictionaries. And internal structures of they states may be
rather complex and cannot be matched in any pgsql's type, even in flat memory
structure.

Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept.

ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
REPLACE olddictname TO newdictname;
ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS] FOR tokentypename;
Is it looking reasonable?

TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries. But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data.Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on? How can we even recognize whether there is
stored data that will be affected by a configuration change? (AFAICS

Very complex task: experienced users could use several configuration
simultaneously. For example: indexing use configuration which doesn't reject
stop-words, but for default searching use configuration which rejects
stop-words. BTW, the same effects may be produced by dictionary's change.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#13Teodor Sigaev
teodor@sigaev.ru
In reply to: Bruce Momjian (#2)
Re: tsearch_core patch: permissions and security issues

can we hard-code into the backend, and just update for every major
release like we do for encodings?

Sorry, no one of them :(. We know projects which introduce new parser, new
dictionary. Config and map are changes very often.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#14Teodor Sigaev
teodor@sigaev.ru
In reply to: Bruce Momjian (#3)
Re: tsearch_core patch: permissions and security issues

But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.

ouch. could mucking with a configuration create a corrupt index?

Depending on what you mean 'corrupted'. It will not corrupted as non-readable
or cause backend crash. But usage of such tsvector column could be limited - not
all words will be searchable.

This sounds sort of analogous to the issues collation bring up.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#12)
Re: tsearch_core patch: permissions and security issues

Teodor Sigaev <teodor@sigaev.ru> writes:

The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.

Yeah. I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all? It would mean that
dict_init functions would have to guard themselves against invalid
options, but they probably ought to do that anyway. If we did that,
I think we could have a fixed set of dictionaries without too much
problem, and focus on just configurations as being user-alterable.

Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept.

ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
REPLACE olddictname TO newdictname;
ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS] FOR tokentypename;
Is it looking reasonable?

Er ... what's the difference between the second and third forms?

regards, tom lane

#16Bruce Momjian
bruce@momjian.us
In reply to: Teodor Sigaev (#14)
Re: tsearch_core patch: permissions and security issues

"Teodor Sigaev" <teodor@sigaev.ru> writes:

But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.

ouch. could mucking with a configuration create a corrupt index?

Depending on what you mean 'corrupted'. It will not corrupted as non-readable
or cause backend crash. But usage of such tsvector column could be limited -
not all words will be searchable.

Am I correct to think of this like changing collations leaving your btree
index "corrupt"? In that case it probably won't cause any backend crash either
but you will get incorrect results. For example, returning different results
depending on whether the index or a full table scan is used.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#17Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#15)
Re: tsearch_core patch: permissions and security issues

Tom Lane wrote:

Teodor Sigaev <teodor@sigaev.ru> writes:

The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.

Yeah. I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all? It would mean that

It should be. Instances of ispell (and synonym, thesaurus) dictionaries are
different only in dict_initoption part, so it will be only one entry in
pg_ts_dict_template and several ones in pg_ts_dict.

ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH
dictname1[, ...];

sets dictionary's list for token's type(s)

ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
REPLACE olddictname TO newdictname;

Replace dictionary to another dictionary in dictionary's list for token's
type(s). This command is very useful for tweaking configuration and for creating
new configuration which differs from already existing one only by pair of
dictionary.

ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS] FOR tokentypename;
Is it looking reasonable?

Er ... what's the difference between the second and third forms?

That changes are doable for several days. I'd like to make changes together with
replacing of FULLTEXT keyword to TEXT SEARCH as you suggested.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#18Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#15)
Re: tsearch_core patch: permissions and security issues

On Thu, 14 Jun 2007, Tom Lane wrote:

Teodor Sigaev <teodor@sigaev.ru> writes:

The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.

Yeah. I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all? It would mean that
dict_init functions would have to guard themselves against invalid
options, but they probably ought to do that anyway. If we did that,
I think we could have a fixed set of dictionaries without too much
problem, and focus on just configurations as being user-alterable.

currently, all dictionaries we provide are all template dictionaries,
so users could change only parameters.

But, there are reasons to allow users register new templates and in fact we
know people/projects with application-dependent dictionaries.
How they could dump/reload their dictionaries ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#19Teodor Sigaev
teodor@sigaev.ru
In reply to: Oleg Bartunov (#18)
Re: tsearch_core patch: permissions and security issues

But, there are reasons to allow users register new templates and in fact
we know people/projects with application-dependent dictionaries. How
they could dump/reload their dictionaries ?

The same way as pg_am does.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#20Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#16)
Re: tsearch_core patch: permissions and security issues

On Thu, 14 Jun 2007, Gregory Stark wrote:

"Teodor Sigaev" <teodor@sigaev.ru> writes:

But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.

ouch. could mucking with a configuration create a corrupt index?

Depending on what you mean 'corrupted'. It will not corrupted as non-readable
or cause backend crash. But usage of such tsvector column could be limited -
not all words will be searchable.

Am I correct to think of this like changing collations leaving your btree
index "corrupt"? In that case it probably won't cause any backend crash either
but you will get incorrect results. For example, returning different results
depending on whether the index or a full table scan is used.

You're correct. But we can't defend users from all possible errors.
Other side, that we need somehow to help user to identify what fts
configuration was used to produce tsvector. For example, comment on
tsvector column would be useful, but we don't know how to do this
automatically.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#21Teodor Sigaev
teodor@sigaev.ru
In reply to: Bruce Momjian (#16)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#20)
#23Michael Paesold
mpaesold@gmx.at
In reply to: Bruce Momjian (#11)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#17)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paesold (#23)
#26Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#20)
#27Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#24)
#28Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#25)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#28)
#30Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#29)
#31Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#22)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#2)
#33Teodor Sigaev
teodor@sigaev.ru
In reply to: Oleg Bartunov (#30)
#34Robert Treat
xzilla@users.sourceforge.net
In reply to: Oleg Bartunov (#31)
#35Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#29)
#36Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#32)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#36)
#39Teodor Sigaev
teodor@sigaev.ru
In reply to: Bruce Momjian (#35)
#40Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#38)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#39)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#40)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#35)
#44Robert Treat
xzilla@users.sourceforge.net
In reply to: Teodor Sigaev (#17)
#45Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#42)
#46Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#41)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#45)
#49Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#37)
#50Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#41)
#51Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#47)
#52Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#48)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#52)
#54Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#53)
#55Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#53)
#56Bruce Momjian
bruce@momjian.us
In reply to: Teodor Sigaev (#51)
#57Teodor Sigaev
teodor@sigaev.ru
In reply to: Bruce Momjian (#56)
#58Bruce Momjian
bruce@momjian.us
In reply to: Teodor Sigaev (#55)
#59Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
#60Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#59)
#61Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#60)
#62Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#61)
#63Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#62)
#64Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#63)
#65Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#63)
#66Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#65)
#67Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#64)
#68Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Bruce Momjian (#67)
#69Bruce Momjian
bruce@momjian.us
In reply to: Nicolas Barbier (#68)
#70Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#64)
#71Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#70)
#72Oleg Bartunov
oleg@sai.msu.su
In reply to: Oleg Bartunov (#71)
#73Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#71)
#74Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#73)
#75Bruce Momjian
bruce@momjian.us
In reply to: Michael Glaesemann (#74)
#76Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#73)
#77Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#76)
#78Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#77)
#79Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#78)
#80Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#78)
#81Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#80)
#82Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#81)
#83Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#82)
#84Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Bruce Momjian (#82)
#85Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#83)
#86Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#30)
#87Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bruce Momjian (#86)
#88Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#85)
#89Oleg Bartunov
oleg@sai.msu.su
In reply to: Pavel Stehule (#87)
#90Pavel Stehule
pavel.stehule@gmail.com
In reply to: Oleg Bartunov (#89)
#91Oleg Bartunov
oleg@sai.msu.su
In reply to: Pavel Stehule (#90)
#92Pavel Stehule
pavel.stehule@gmail.com
In reply to: Oleg Bartunov (#91)
#93Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#86)
#94Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#88)
#95Bruce Momjian
bruce@momjian.us
In reply to: Dimitri Fontaine (#84)
#96Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#93)
#97Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#96)
#98Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#96)
#99Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#98)
#100Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#86)
#101Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#100)
#102Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#101)
#103Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#99)
#104Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#100)
#105Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#104)
#106Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#105)
#107Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#106)
#108Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#107)
#109Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Bruce Momjian (#107)
#110Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#108)
#111Bruce Momjian
bruce@momjian.us
In reply to: Ron Mayer (#109)
#112Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Bruce Momjian (#111)
#113Bruce Momjian
bruce@momjian.us
In reply to: Ron Mayer (#112)
#114Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Bruce Momjian (#113)
#115Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#113)
#116Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#115)
#117Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#116)
#118Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#117)
#119tomas@tuxteam.de
tomas@tuxteam.de
In reply to: Bruce Momjian (#118)
#120Oleg Bartunov
oleg@sai.msu.su
In reply to: tomas@tuxteam.de (#119)
#121tomas@tuxteam.de
tomas@tuxteam.de
In reply to: Oleg Bartunov (#120)
#122Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Oleg Bartunov (#117)
#123Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#116)
#124Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#122)
#125Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#123)
#126Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Oleg Bartunov (#120)
#127Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#126)
#128Mike Rylander
mrylander@gmail.com
In reply to: Bruce Momjian (#125)
#129Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Mike Rylander (#128)
#130Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#125)
#131Oleg Bartunov
oleg@sai.msu.su
In reply to: Alvaro Herrera (#126)
#132Mike Rylander
mrylander@gmail.com
In reply to: Heikki Linnakangas (#129)
#133Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#130)
#134Bruce Momjian
bruce@momjian.us
In reply to: Mike Rylander (#128)
#135Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#134)
#136Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Oleg Bartunov (#131)
#137Mike Rylander
mrylander@gmail.com
In reply to: Alvaro Herrera (#136)
#138Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#135)
#139Bruce Momjian
bruce@momjian.us
In reply to: Mike Rylander (#128)
#140Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#138)
#141Mike Rylander
mrylander@gmail.com
In reply to: Bruce Momjian (#139)
#142Oleg Bartunov
oleg@sai.msu.su
In reply to: Alvaro Herrera (#136)
#143Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Mike Rylander (#128)