new full text search configurations
I checked new snowball site http://snowballstem.org/ and found several new
stemmers appeared (as external contributions):
- Irish and Czech <http://snowballstem.org/otherapps/oregan/>
- Object Pascal codegenerator for Snowball
<http://snowballstem.org/otherapps/pascal/>
- Two stemmers for Romanian <http://snowballstem.org/otherapps/romanian/>
- Hungarian <http://snowballstem.org/algorithms/hungarian/stemmer.html>
- Turkish <http://snowballstem.org/algorithms/turkish/stemmer.html>
- Armenian <http://snowballstem.org/algorithms/armenian/stemmer.html>
- Basque (Euskera)
<http://snowballstem.org/algorithms/basque/stemmer.html>
- Catalan <http://snowballstem.org/algorithms/catalan/stemmer.html>
Some of them we don't have in our list of default configurations. Since
these are external, not official stemmers, it'd be nice if people look
and test them. If they are fine, we can prepare new configurations for 9.6.
\dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | configuration for danish language
pg_catalog | dutch | configuration for dutch language
pg_catalog | english | configuration for english language
pg_catalog | finnish | configuration for finnish language
pg_catalog | french | configuration for french language
pg_catalog | german | configuration for german language
pg_catalog | hungarian | configuration for hungarian language
pg_catalog | italian | configuration for italian language
pg_catalog | norwegian | configuration for norwegian language
pg_catalog | portuguese | configuration for portuguese language
pg_catalog | romanian | configuration for romanian language
pg_catalog | russian | configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | configuration for spanish language
pg_catalog | swedish | configuration for swedish language
pg_catalog | turkish | configuration for turkish language
public | english_ns |
(17 rows)
Hi
2015-11-17 17:28 GMT+01:00 Oleg Bartunov <obartunov@gmail.com>:
I checked new snowball site http://snowballstem.org/ and found several
new stemmers appeared (as external contributions):- Irish and Czech <http://snowballstem.org/otherapps/oregan/>
Czech snowball needs recheck - 5 years ago it was not success in my tests
Regards
Pavel
Show quoted text
- Object Pascal codegenerator for Snowball
<http://snowballstem.org/otherapps/pascal/>
- Two stemmers for Romanian
<http://snowballstem.org/otherapps/romanian/>
- Hungarian <http://snowballstem.org/algorithms/hungarian/stemmer.html>
- Turkish <http://snowballstem.org/algorithms/turkish/stemmer.html>
- Armenian <http://snowballstem.org/algorithms/armenian/stemmer.html>
- Basque (Euskera)
<http://snowballstem.org/algorithms/basque/stemmer.html>
- Catalan <http://snowballstem.org/algorithms/catalan/stemmer.html>Some of them we don't have in our list of default configurations. Since
these are external, not official stemmers, it'd be nice if people look
and test them. If they are fine, we can prepare new configurations for 9.6.\dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | configuration for danish language
pg_catalog | dutch | configuration for dutch language
pg_catalog | english | configuration for english language
pg_catalog | finnish | configuration for finnish language
pg_catalog | french | configuration for french language
pg_catalog | german | configuration for german language
pg_catalog | hungarian | configuration for hungarian language
pg_catalog | italian | configuration for italian language
pg_catalog | norwegian | configuration for norwegian language
pg_catalog | portuguese | configuration for portuguese language
pg_catalog | romanian | configuration for romanian language
pg_catalog | russian | configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | configuration for spanish language
pg_catalog | swedish | configuration for swedish language
pg_catalog | turkish | configuration for turkish language
public | english_ns |
(17 rows)
I checked new snowball site http://snowballstem.org/ and found several new
stemmers appeared (as external contributions):Irish and Czech
Object Pascal codegenerator for Snowball
Two stemmers for Romanian
Hungarian
Turkish
Armenian
Basque (Euskera)
CatalanSome of them we don't have in our list of default configurations. Since
these are external, not official stemmers, it'd be nice if people look and
test them. If they are fine, we can prepare new configurations for 9.6.
We have configurations for the ones included to the Snowball, namely
Romanian, Hungarian, and Turkish. I don't know why the others are not
included but listed on the page as external contributions. It might
be a good idea to wait for someone to commit them to the upstream.
I have checked the changes on the algorithms [1]https://github.com/snowballstem/snowball/commits/master/algorithms. They don't seemed
to be updated much after 2007, but recently a new one for Tamil
language is added. It might be a good candidate for a new
configuration.
[1]: https://github.com/snowballstem/snowball/commits/master/algorithms
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers