Feature: Add Greek language fulltext search

Started by Panagiotis Mavrogiorgosabout 7 years ago5 messageshackers
Jump to latest
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Panagiotis Mavrogiorgos (#1)
Re: Feature: Add Greek language fulltext search

Panagiotis Mavrogiorgos <pmav99@gmail.com> writes:

Last November snowball added support for Greek language [1]. Following the
instructions [2], I wrote a patch that adds fulltext search for Greek in
Postgres. The patch is attached.

Cool!

I would appreciate any feedback that will help in getting this merged.

We're past the deadline for submitting features for v12, but please
register this patch in the first v13 commitfest so that we remember
about it when the time comes:

https://commitfest.postgresql.org/23/

regards, tom lane

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Panagiotis Mavrogiorgos (#1)
Re: Feature: Add Greek language fulltext search

On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:

Last November snowball added support for Greek language [1]. Following
the instructions [2], I wrote a patch that adds fulltext search for
Greek in Postgres. The patch is attached. 

I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.

Could you please clarify where you got the stopword list from? The
README says those need to be downloaded separately, but I wasn't able to
find the download location. It would be good to document this, for
example in the commit message. I haven't committed the stopword list yet.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Peter Eisentraut (#3)
Re: Feature: Add Greek language fulltext search

On Thu, Jul 4, 2019 at 1:39 PM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:

Last November snowball added support for Greek language [1]. Following
the instructions [2], I wrote a patch that adds fulltext search for
Greek in Postgres. The patch is attached.

I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.

Could you please clarify where you got the stopword list from? The
README says those need to be downloaded separately, but I wasn't able to
find the download location. It would be good to document this, for
example in the commit message. I haven't committed the stopword list yet.

Thank you Peter,

Here is the repo with the stop-words:
https://github.com/pmav99/greek_stopwords
The list is based on an earlier publication with modification by me. All
the relevant info is on github.

Disclaimer 1: The list has not been validated by an expert.

Disclaimer 2: There are more stop-words lists on the internet, but they are
less complete and they also use ancient greek words. Furthermore, my
testing showed that snowball needs to handle accents (tonous) and ς (teliko
sigma) in a special way if you want the stemmer to work with capitalized
words too.

https://github.com/Xangis/extra-stopwords/blob/master/greek
https://github.com/stopwords-iso/stopwords-el/tree/master/raw

all the best,
Panagiotis

#5Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Peter Eisentraut (#3)
Re: Feature: Add Greek language fulltext search

On 7/4/19 1:39 PM, Peter Eisentraut wrote:

On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:

Last November snowball added support for Greek language [1]. Following
the instructions [2], I wrote a patch that adds fulltext search for
Greek in Postgres. The patch is attached. 

I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.

Could you please clarify where you got the stopword list from? The
README says those need to be downloaded separately, but I wasn't able to
find the download location. It would be good to document this, for
example in the commit message. I haven't committed the stopword list yet.

Thanks, I noted snowball pushed a new commit related to greek stemmer few days
after your sync:
https://github.com/snowballstem/snowball/commit/533602101f963eeb0c38343d94c428ceef740c0c

As it seems there is no policy for stable release on Snowball, I don't know what
is the best way to keep in sync :(