Full text search question: "01.Bez." --> "Erster Bezirk"

Started by Johann Höchtlabout 10 years ago3 messagesgeneral
Jump to latest
#1Johann Höchtl
johann.hoechtl@gmail.com

I fear I have an involved challenge concerning FTS.

Assume I have the following text in a column:

Graz,06.Bez.:Blah

This parses as:
SELECT alias, description, token FROM ts_debug('german',
'Graz,06.Bez.:Blah');
alias | description | token
-----------+-----------------+--------
asciiword | Word, all ASCII | Graz
blank | Space symbols | ,
host | Host | 06.Bez
blank | Space symbols | .:
asciiword | Word, all ASCII | Blah

Bez. ist the abbreviation for "Bezirk" (german for ~district). 06.Bez
means "6th district"

My first problem might be that the parser identifies "06.Bez." as a host
lexeme, but ...

I already defined a synonym dictionary to enable searching for "Bezirk",
when there is only "Bez." in the database:

file: bevaddress_host.syn:
01.bez bezirk
06.bez bezirk
<snip some more rows>

CREATE TEXT SEARCH DICTIONARY bevaddress_host_syn (
TEMPLATE = synonym,
SYNONYMS = bevaddress_host
);
ALTER TEXT SEARCH CONFIGURATION german ALTER MAPPING FOR host WITH
bevaddress_host_syn, simple;

I wonder how I can achieve to be able to search for "Erster Bezirk"
("First district") to match eg. "01.Bez."

Thank you for your help, Johann

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Dane Foster
studdugie@gmail.com
In reply to: Johann Höchtl (#1)
Re: Full text search question: "01.Bez." --> "Erster Bezirk"

Hello,


On Sat, Mar 12, 2016 at 11:40 AM, Johann Höchtl <johann.hoechtl@gmail.com>
wrote:

I fear I have an involved challenge concerning FTS.

Assume I have the following text in a column:

Graz,06.Bez.:Blah

This parses as:
SELECT alias, description, token FROM ts_debug('german',
'Graz,06.Bez.:Blah');
alias | description | token
-----------+-----------------+--------
asciiword | Word, all ASCII | Graz
blank | Space symbols | ,
host | Host | 06.Bez
blank | Space symbols | .:
asciiword | Word, all ASCII | Blah

Bez. ist the abbreviation for "Bezirk" (german for ~district). 06.Bez
means "6th district"

My first problem might be that the parser identifies "06.Bez." as a host
lexeme, but ...

I already defined a synonym dictionary to enable searching for "Bezirk",
when there is only "Bez." in the database:

file: bevaddress_host.syn:
01.bez bezirk
06.bez bezirk
<snip some more rows>

CREATE TEXT SEARCH DICTIONARY bevaddress_host_syn (
TEMPLATE = synonym,
SYNONYMS = bevaddress_host
);
ALTER TEXT SEARCH CONFIGURATION german ALTER MAPPING FOR host WITH
bevaddress_host_syn, simple;

I wonder how I can achieve to be able to search for "Erster Bezirk"
("First district") to match eg. "01.Bez."

Thank you for your help, Johann

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


As of the time of writing this I haven't seen any replies to your post so
you may not be aware that an answer was provided to your specific question
in a blog. http://obartunov.livejournal.com/185579.html

Regards,​



Dane​

#3Johann Höchtl
johann.hoechtl@gmail.com
In reply to: Dane Foster (#2)
Re: Full text search question: "01.Bez." --> "Erster Bezirk"

Thank you, I was in direct contact with the author. All my issues and
questions got sorted out. it's working perfectly!

Thank you, Johann

2016-03-13 18:32 GMT+01:00 Dane Foster <studdugie@gmail.com>:

Show quoted text

Hello,


On Sat, Mar 12, 2016 at 11:40 AM, Johann Höchtl <johann.hoechtl@gmail.com>
wrote:

I fear I have an involved challenge concerning FTS.

Assume I have the following text in a column:

Graz,06.Bez.:Blah

This parses as:
SELECT alias, description, token FROM ts_debug('german',
'Graz,06.Bez.:Blah');
alias | description | token
-----------+-----------------+--------
asciiword | Word, all ASCII | Graz
blank | Space symbols | ,
host | Host | 06.Bez
blank | Space symbols | .:
asciiword | Word, all ASCII | Blah

Bez. ist the abbreviation for "Bezirk" (german for ~district). 06.Bez
means "6th district"

My first problem might be that the parser identifies "06.Bez." as a host
lexeme, but ...

I already defined a synonym dictionary to enable searching for "Bezirk",
when there is only "Bez." in the database:

file: bevaddress_host.syn:
01.bez bezirk
06.bez bezirk
<snip some more rows>

CREATE TEXT SEARCH DICTIONARY bevaddress_host_syn (
TEMPLATE = synonym,
SYNONYMS = bevaddress_host
);
ALTER TEXT SEARCH CONFIGURATION german ALTER MAPPING FOR host WITH
bevaddress_host_syn, simple;

I wonder how I can achieve to be able to search for "Erster Bezirk"
("First district") to match eg. "01.Bez."

Thank you for your help, Johann

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


As of the time of writing this I haven't seen any replies to your post so
you may not be aware that an answer was provided to your specific question
in a blog. http://obartunov.livejournal.com/185579.html

Regards,​



Dane​