tsearch in core patch

Started by Teodor Sigaevover 18 years ago27 messages

Teodor Sigaev

teodor@sigaev.ru

over 18 years ago

http://www.sigaev.ru/misc/tsearch_core-0.52.gz

Plan was:

1) rename FULLTEXT to TEXT SEARCH in SQL command
done

2) rework Snowball stemmer's as Tom suggested
done

3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
done

4) remove support of default configuration per scheme. Default configuration
will be only one per locale.
done

5) single encoded files. That will touch snowball, ispell, synonym, thesaurus
and simple dictionaries
done

6) use encoding names instead of locale's names in configuration
Ugh. I missed that knowledge of encoding doesn't allow to determine exact
language --- how do many languages use ISO8859-1 locale?. So, it's not done. Tom
pointed that locale's name isn't portable, but there isn't a lot of names of the
same locale (ru_RU.UTF-8, ru_RU.UTF8 for example). So it's possible to use array
of locales instead of one name.

I didn't see comments about security hole pointed by Tom, so I repeat:

About security holes in PARSER/DICTIONARY. I see following ways to resolve it now:
1) Allow to superuser only to do CREATE/ALTER/DROP PARSER/DICTIONARY
Disadvantage: hosting users will not be able to change dictionaries
2) Remove CREATE/ALTER/DROP PARSER, split pg_ts_dict to pg_ts_dict_template
and pg_ts_dict and accordingly change CREATE/ALTER/DROP DICTIONARY
Disadvantage: parser and dictionary's template will not dump/restore,
it should be restored manually (just a INSERT into
pg_ts_parser/pg_ts_dict_template)
3) Similar to previous point, but:
* CREATE/ALTER/DROP PARSER - super-user only
* CREATE/ALTER/DROP DICTIONARY TEMPLATE - super-user only
* CREATE/ALTER/DROP DICTIONARY - allowed to non-superuser
Disadvantage: new command CREATE/ALTER/DROP DICTIONARY TEMPLATE
Which way do we choose? or I miss some variant?

I would like to go by 3) way... Comments?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Hannu Krosing

hannu@skype.net

over 18 years ago

In reply to: Teodor Sigaev (#1)

Re: tsearch in core patch

Ühel kenal päeval, N, 2007-06-21 kell 21:44, kirjutas Teodor Sigaev:

http://www.sigaev.ru/misc/tsearch_core-0.52.gz

Plan was:

1) rename FULLTEXT to TEXT SEARCH in SQL command
done

2) rework Snowball stemmer's as Tom suggested
done

3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
done

Why not rename ALTER FULLTEXT CONFIGURATION --> ALTER TEXT SEARCH
CONFIGURATION here too ?

4) remove support of default configuration per scheme. Default configuration
will be only one per locale.
done

5) single encoded files. That will touch snowball, ispell, synonym, thesaurus
and simple dictionaries
done

6) use encoding names instead of locale's names in configuration
Ugh. I missed that knowledge of encoding doesn't allow to determine exact
language

most languages can be written using UNICODE charset and UTF-8 encoding,
so neither charset not encoding can be used to determine language.

--- how do many languages use ISO8859-1 locale?.

ISO8859-1 is encoding, not locale.

Show quoted text

So, it's not done. Tom
pointed that locale's name isn't portable, but there isn't a lot of names of the
same locale (ru_RU.UTF-8, ru_RU.UTF8 for example). So it's possible to use array
of locales instead of one name.

I didn't see comments about security hole pointed by Tom, so I repeat:

About security holes in PARSER/DICTIONARY. I see following ways to resolve it now:
1) Allow to superuser only to do CREATE/ALTER/DROP PARSER/DICTIONARY
Disadvantage: hosting users will not be able to change dictionaries
2) Remove CREATE/ALTER/DROP PARSER, split pg_ts_dict to pg_ts_dict_template
and pg_ts_dict and accordingly change CREATE/ALTER/DROP DICTIONARY
Disadvantage: parser and dictionary's template will not dump/restore,
it should be restored manually (just a INSERT into
pg_ts_parser/pg_ts_dict_template)
3) Similar to previous point, but:
* CREATE/ALTER/DROP PARSER - super-user only
* CREATE/ALTER/DROP DICTIONARY TEMPLATE - super-user only
* CREATE/ALTER/DROP DICTIONARY - allowed to non-superuser
Disadvantage: new command CREATE/ALTER/DROP DICTIONARY TEMPLATE
Which way do we choose? or I miss some variant?

I would like to go by 3) way... Comments?

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Hannu Krosing (#2)

Re: tsearch in core patch

Hannu Krosing <hannu@skype.net> writes:

Ühel kenal päeval, N, 2007-06-21 kell 21:44, kirjutas Teodor Sigaev:

6) use encoding names instead of locale's names in configuration
Ugh. I missed that knowledge of encoding doesn't allow to determine exact
language

most languages can be written using UNICODE charset and UTF-8 encoding,
so neither charset not encoding can be used to determine language.

The recommendation I was making was to use the language name, not the
encoding name, in the user-visible configuration.

regards, tom lane

Teodor Sigaev

teodor@sigaev.ru

over 18 years ago

In reply to: Hannu Krosing (#2)

Re: tsearch in core patch

3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
done

Why not rename ALTER FULLTEXT CONFIGURATION --> ALTER TEXT SEARCH
CONFIGURATION here too ?

It's renamed too.

most languages can be written using UNICODE charset and UTF-8 encoding,
so neither charset not encoding can be used to determine language.

yes

--- how do many languages use ISO8859-1 locale?. 
ISO8859-1 is encoding, not locale.

I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't
distinguish languages which use that encoding (for example italian and finnish
and some more), but using locale names it's possible: it_IT.ISO8859-1,
fi_FI.ISO8859-1

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Teodor Sigaev

teodor@sigaev.ru

over 18 years ago

In reply to: Tom Lane (#3)

Re: tsearch in core patch

The recommendation I was making was to use the language name, not the
encoding name, in the user-visible configuration.

How does it determine language of db automatically?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Teodor Sigaev (#5)

Re: tsearch in core patch

Teodor Sigaev wrote:

The recommendation I was making was to use the language name, not the
encoding name, in the user-visible configuration.

How does it determine language of db automatically?

I don't think we are going to do language selection automatically ---
the user is going to have to set tsearch_conf_name.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Teodor Sigaev

teodor@sigaev.ru

over 18 years ago

In reply to: Bruce Momjian (#6)

Re: tsearch in core patch

I don't think we are going to do language selection automatically ---
the user is going to have to set tsearch_conf_name.

Are you suggest to remove long-lived feature of tsearch? In that case we don't
need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in
pg_ts_cfg at all. Just set up tsearch_conf_name.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Alvaro Herrera

alvherre@commandprompt.com

over 18 years ago

In reply to: Teodor Sigaev (#4)

Re: tsearch in core patch

Teodor Sigaev wrote:

--- how do many languages use ISO8859-1 locale?. 
ISO8859-1 is encoding, not locale.
I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't
distinguish languages which use that encoding (for example italian and
finnish and some more), but using locale names it's possible:
it_IT.ISO8859-1, fi_FI.ISO8859-1

I don't understand. Why use "it_IT.ISO8859-1"? You just need to know
the language, so "it" is enough. The _IT part specifies that it's the
italian spoken in Italy. This may be irrelevant in most cases, but
consider that pt_PT and pt_BR are AFAIK somewhat different languages.

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure. Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Teodor Sigaev (#7)

Re: tsearch in core patch

Teodor Sigaev <teodor@sigaev.ru> writes:

I don't think we are going to do language selection automatically ---
the user is going to have to set tsearch_conf_name.

Are you suggest to remove long-lived feature of tsearch? In that case we don't
need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in
pg_ts_cfg at all. Just set up tsearch_conf_name.

Is the point here for initdb to be able to establish a sane default
initially? Seems to me it can guess the language from the first
component of the locale (ru_RU -> russian).

regards, tom lane

#10

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Alvaro Herrera (#8)

Re: tsearch in core patch

Alvaro Herrera <alvherre@commandprompt.com> writes:

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure. Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names. AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to. The trick is to not look at any more of the
locale name than that; and if we standardize on "stopword files are
UTF8" then I don't think we need to.

regards, tom lane

#11

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Tom Lane (#10)

Re: tsearch in core patch

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure. Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names. AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to. The trick is to not look at any more of the
locale name than that; and if we standardize on "stopword files are
UTF8" then I don't think we need to.

OK, and the open question is when do we do this default setting. If we
do it in initdb then we can isolate all the detection there.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12

Oleg Bartunov

oleg@sai.msu.su

over 18 years ago

In reply to: Bruce Momjian (#11)

Re: tsearch in core patch

On Fri, 22 Jun 2007, Bruce Momjian wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure. Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names. AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to. The trick is to not look at any more of the
locale name than that; and if we standardize on "stopword files are
UTF8" then I don't think we need to.

OK, and the open question is when do we do this default setting. If we
do it in initdb then we can isolate all the detection there.

We can do that at initdb time, but we still have to decide how to map
human-readable language name and lang part of locale name. Are we going
to hardcode it ?

It's not friendly for hosting solution, when people often have no access
to the postgresql.conf, so they need to remember setting tsearch_conf_name.
It could be solved using 'alter user ... set tsearch_conf_name' command though.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#13

Magnus Hagander

magnus@hagander.net

over 18 years ago

In reply to: Tom Lane (#10)

Re: tsearch in core patch

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure. Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names. AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.

That may have been true until we started supporting Windows...
Swedish_Sweden.1252 is what I get on my machine, for example. Principle
is the same, but values certainly aren't.

//Magnus

#14

Alvaro Herrera

alvherre@commandprompt.com

over 18 years ago

In reply to: Magnus Hagander (#13)

Re: tsearch in core patch

Magnus Hagander wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure. Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names. AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.

That may have been true until we started supporting Windows...
Swedish_Sweden.1252 is what I get on my machine, for example. Principle
is the same, but values certainly aren't.

Well, at least the name is not itself translated, so a mapping table is
not right out of the question. If they had put a name like
"Espaï¿½ol_Chile" instead of "Spanish_Chile" we would be in serious
trouble.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#15

Michael Glaesemann

grzm@seespotcode.net

over 18 years ago

In reply to: Tom Lane (#9)

Re: tsearch in core patch

On Jun 22, 2007, at 9:28 , Tom Lane wrote:

Is the point here for initdb to be able to establish a sane default
initially? Seems to me it can guess the language from the first
component of the locale (ru_RU -> russian).

How would this work for initdb with locale C?

Michael Glaesemann
grzm seespotcode net

#16

Noname

teodor@sigaev.ru

over 18 years ago

In reply to: Michael Glaesemann (#15)

Re: tsearch in core patch

That may have been true until we started supporting Windows...
Swedish_Sweden.1252 is what I get on my machine, for example. Principle
is the same, but values certainly aren't.

Well, at least the name is not itself translated, so a mapping table is
not right out of the question. If they had put a name like
"Español_Chile" instead of "Spanish_Chile" we would be in serious
trouble.

I don't think so, in oppsite case you can't type or show it to change
locale :).

So, final propose:
rename cfglocale to cfglanguages and store in it array of laguage names
which is produced from first part of locale names:
russian '{ru_RU, Russian_Russia}'
spanish '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}'

Comments?

Is there some obstacles to use GIN indexes in pg_catalog?

Import Notes

Resolved by subject fallback

#17

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Michael Glaesemann (#15)

Re: tsearch in core patch

Michael Glaesemann wrote:

On Jun 22, 2007, at 9:28 , Tom Lane wrote:

Is the point here for initdb to be able to establish a sane default
initially? Seems to me it can guess the language from the first
component of the locale (ru_RU -> russian).

How would this work for initdb with locale C?

Yea, that's a problem. I am thinking we should just avoid the entire
issue and require it to be set by the user, and throw an error if the
configuration is not set.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#18

Tatsuo Ishii

ishii@sraoss.co.jp

over 18 years ago

In reply to: Michael Glaesemann (#15)

Re: tsearch in core patch

On Jun 22, 2007, at 9:28 , Tom Lane wrote:

Is the point here for initdb to be able to establish a sane default
initially? Seems to me it can guess the language from the first
component of the locale (ru_RU -> russian).

How would this work for initdb with locale C?

I'm worrying about that too.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

#19

Alvaro Herrera

alvherre@commandprompt.com

over 18 years ago

In reply to: Noname (#16)

Re: tsearch in core patch

teodor@sigaev.ru wrote:

So, final propose:
rename cfglocale to cfglanguages and store in it array of laguage names
which is produced from first part of locale names:
russian '{ru_RU, Russian_Russia}'
spanish '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}'

Comments?

Why not do it the other way around?
es_ES spanish
Spanish_Spain spanish
ru_RU russian
pt_BR portuguese_brazil

That way you don't need any funny index. Or do you need the list of
locales for each language? (but even if you do, you can easily obtain it
by indexing both columns separately using btrees anyway)

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"I can see support will not be a problem. 10 out of 10." (Simon Wittber)
(http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)

#20

Alvaro Herrera

alvherre@commandprompt.com

over 18 years ago

In reply to: Noname (#16)

Re: tsearch in core patch

teodor@sigaev.ru wrote:

Why not do it the other way around?
es_ES spanish
Spanish_Spain spanish
ru_RU russian
pt_BR portuguese_brazil

That way you don't need any funny index. Or do you need the list of
locales for each language? (but even if you do, you can easily obtain it
by indexing both columns separately using btrees anyway)

Yes, that's possible but that icreases number of identical configuration:
russian_win Russian_Russia
russian_unix ru_RU

They doesn't differ except locale name.

But why do you need them to be different at all? Just make it

russian Russian_Russia
russian ru_RU

Does that not work for some reason?

What I was really suggesting was having a table mapping locale names
into "tsearch languages". Then the configuration could be made based on
the language, not on the locale name. So the stopword list is for
"russian", regardless of whether the locale is Russian_Russia or ru_RU.

Is this only for the stopword list, or does it also affect selecting a
stemmer?

Note: it's possible that the stopword list is different for brazilian
portuguese than portuguese portuguese, which is why I was suggesting
using a language "portuguese_brazil" and not just "postuguese". Whereas
you need a single stopword list for all the countries speaking spanish,
which is why you need only one language called spanish.

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Llegarï¿½ una ï¿½poca en la que una investigaciï¿½n diligente y prolongada sacarï¿½
a la luz cosas que hoy estï¿½n ocultas" (Sï¿½neca, siglo I)

Import Notes

Reply to msg id not found: 1189.91.76.165.155.1182529215.squirrel@mail.sigaev.ru

#21

Noname

teodor@sigaev.ru

over 18 years ago

In reply to: Alvaro Herrera (#19)

Re: tsearch in core patch

Why not do it the other way around?
es_ES spanish
Spanish_Spain spanish
ru_RU russian
pt_BR portuguese_brazil

That way you don't need any funny index. Or do you need the list of
locales for each language? (but even if you do, you can easily obtain it
by indexing both columns separately using btrees anyway)

Yes, that's possible but that icreases number of identical configuration:
russian_win Russian_Russia
russian_unix ru_RU

They doesn't differ except locale name.

#22

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Tatsuo Ishii (#18)

Re: tsearch in core patch

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

On Jun 22, 2007, at 9:28 , Tom Lane wrote:

Is the point here for initdb to be able to establish a sane default
initially? Seems to me it can guess the language from the first
component of the locale (ru_RU -> russian).

How would this work for initdb with locale C?

I'm worrying about that too.

I would be surprised if C locale defaulted to anything except English.
I suppose it would be sensible to add a switch to allow people to select
a different language. In any case, the only thing initdb would be doing
would be setting up an initial value of a table entry or GUC variable,
so you could always change it yourself later; it may not be worth
sweating too much about this.

regards, tom lane

#23

Euler Taveira de Oliveira

euler@timbira.com

over 18 years ago

In reply to: Alvaro Herrera (#20)

Re: tsearch in core patch

Alvaro Herrera wrote:

What I was really suggesting was having a table mapping locale names
into "tsearch languages". Then the configuration could be made based on
the language, not on the locale name. So the stopword list is for
"russian", regardless of whether the locale is Russian_Russia or ru_RU.

Agreed. But I'm afraid we couldn't map all of the locale names in a
right way. Man, it's a large list. ;)

Is this only for the stopword list, or does it also affect selecting a
stemmer?

Both.

Note: it's possible that the stopword list is different for brazilian
portuguese than portuguese portuguese, which is why I was suggesting
using a language "portuguese_brazil" and not just "postuguese". Whereas
you need a single stopword list for all the countries speaking spanish,
which is why you need only one language called spanish.

Indeed it's possible for portuguese, because we have some words that are
written in different ways, e.g.,
pt_BR pt_PT english
Mï¿½nica Mï¿½nica Monica
aï¿½ï¿½o acï¿½ï¿½o action
Irï¿½ Irï¿½o Iran
.
.
.

Will it be possible to disable stemming or stopwords removal? I'm asking
this 'cause sometimes stemming doesn't lead to good results and/or
stopwords are relevant. Maybe it could be an GUC variables
('enable_stemming' and 'enable_stopwords').

--
Euler Taveira de Oliveira
http://www.timbira.com/

#24

Oleg Bartunov

oleg@sai.msu.su

over 18 years ago

In reply to: Euler Taveira de Oliveira (#23)

Re: tsearch in core patch

On Sat, 23 Jun 2007, Euler Taveira de Oliveira wrote:

Will it be possible to disable stemming or stopwords removal? I'm asking
this 'cause sometimes stemming doesn't lead to good results and/or
stopwords are relevant. Maybe it could be an GUC variables
('enable_stemming' and 'enable_stopwords').

Just use another configuration.

#25

Tatsuo Ishii

ishii@sraoss.co.jp

over 18 years ago

In reply to: Tom Lane (#22)

Re: tsearch in core patch

I would be surprised if C locale defaulted to anything except English.

Don't be surprised. The mechanism of collation is too simple for
Japanse Kanji, and locale is not usefull for Japanse anyway. That's
why Japanese installations of PostgreSQL tend to use C locale.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Show quoted text

I suppose it would be sensible to add a switch to allow people to select
a different language. In any case, the only thing initdb would be doing
would be setting up an initial value of a table entry or GUC variable,
so you could always change it yourself later; it may not be worth
sweating too much about this.

regards, tom lane

#26

Teodor Sigaev

teodor@sigaev.ru

over 18 years ago

In reply to: Alvaro Herrera (#20)

Re: tsearch in core patch

But why do you need them to be different at all? Just make it
russian Russian_Russia
russian ru_RU

Does that not work for some reason?

I'd like to have unique names of configuration. So, if user sets GUC variable or 
call function with configuration's name then postgres should not have a choice 
--- it should use pointed configuration exactly.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#27

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Teodor Sigaev (#26)

Re: tsearch in core patch

Teodor Sigaev <teodor@sigaev.ru> writes:

But why do you need them to be different at all? Just make it
russian Russian_Russia
russian ru_RU

Does that not work for some reason?

I'd like to have unique names of configuration. So, if user sets GUC variable or 
call function with configuration's name then postgres should not have a choice 
--- it should use pointed configuration exactly.

Sure, but the configuration name in this example is "russian", and it's
unique, no?

regards, tom lane