text search synonym dictionary anomaly with numbers

Started by Richard Greenwoodover 14 years ago4 messagesgeneral
Jump to latest
#1Richard Greenwood
richard.greenwood@gmail.com

I am working with street address data in which 'first st' has been
entered as '1 st' and so on. So I have created a text search
dictionary with entries:
first 1
1st 1
And initially it seems to be working properly:

SELECT ts_lexize('rwg_synonym','first');
ts_lexize
-----------
{1}

SELECT ts_lexize('rwg_synonym','1st');
ts_lexize
-----------
{1}

But my queries on '1st' are not returning the expected results:

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
count
-------
403 <- this is what I want

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
count
-------
403 <- this is also good

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
count
-------
4 <- this is not good. There are 4 records that do have '1st',
but why am I not getting 403 records?

Thanks for reading,
Rich

--
Richard Greenwood
richard.greenwood@gmail.com
www.greenwoodmap.com

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Richard Greenwood (#1)
Re: text search synonym dictionary anomaly with numbers

Richard,

you should check your mapping - '1st' belongs to 'numword' and may be processed
in a different way than 'first' or '1'.

Oleg
On Sat, 26 Nov 2011, Richard Greenwood wrote:

I am working with street address data in which 'first st' has been
entered as '1 st' and so on. So I have created a text search
dictionary with entries:
first 1
1st 1
And initially it seems to be working properly:

SELECT ts_lexize('rwg_synonym','first');
ts_lexize
-----------
{1}

SELECT ts_lexize('rwg_synonym','1st');
ts_lexize
-----------
{1}

But my queries on '1st' are not returning the expected results:

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
count
-------
403 <- this is what I want

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
count
-------
403 <- this is also good

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
count
-------
4 <- this is not good. There are 4 records that do have '1st',
but why am I not getting 403 records?

Thanks for reading,
Rich

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Richard Greenwood
richard.greenwood@gmail.com
In reply to: Oleg Bartunov (#2)
Re: text search synonym dictionary anomaly with numbers

Oleg,

Thank you. I am sure that you have identified my problem.

\dF+ english (output below) lists my dictionary which is named
'rwg_synonym' before numword so I would have thought that my
dictionary would have normalized '1st' to '1' before the numword
dictionary was reached. Maybe this question belongs in a new thread,
but I do thank you for helping me to look in the correct place.

Best regards,
Rich

fremontwy=# \dF+ english
Text search configuration "pg_catalog.english"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+--------------------------
asciihword | english_stem
asciiword | rwg_synonym,english_stem
email | simple
file | simple
float | simple
host | simple
hword | english_stem
hword_asciipart | english_stem
hword_numpart | simple
hword_part | english_stem
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_stem

On Sun, Nov 27, 2011 at 7:29 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Richard,

you should check your mapping - '1st' belongs to 'numword' and may be
processed
in a different way than 'first' or '1'.

Oleg
On Sat, 26 Nov 2011, Richard Greenwood wrote:

I am working with street address data in which 'first st' has been
entered as '1 st' and so on. So I have created a text search
dictionary with entries:
   first  1
   1st  1
And initially it seems to be working properly:

SELECT ts_lexize('rwg_synonym','first');
ts_lexize
-----------
{1}

SELECT ts_lexize('rwg_synonym','1st');
ts_lexize
-----------
{1}

But my queries on '1st' are not returning the expected results:

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
count
-------
 403  <- this is what I want

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
count
-------
 403  <- this is also good

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
count
-------
   4  <- this is not good. There are 4 records that do have '1st',
but why am I not getting 403 records?

Thanks for reading,
Rich

       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Richard Greenwood
richard.greenwood@gmail.com
www.greenwoodmap.com

#4Richard Greenwood
richard.greenwood@gmail.com
In reply to: Richard Greenwood (#3)
Re: text search synonym dictionary anomaly with numbers

To answer my own question - my synonym dictionary was not be applied
to '1st' because '1st' is a numword, not an asciiword, and my synonym
dictionary was not mapped to numword. To map a dictionary token class:

ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR numword WITH my_synonym_dictionary, simple;

The dictionary must already have been created with CREATE TEXT SEARCH
DICTIONARY

Rich

On Sun, Nov 27, 2011 at 9:57 AM, Richard Greenwood
<richard.greenwood@gmail.com> wrote:

Oleg,

Thank you. I am sure that you have identified my problem.

 \dF+ english (output below) lists my dictionary which is named
'rwg_synonym' before numword so I would have thought that my
dictionary would have normalized '1st' to '1' before the numword
dictionary was reached. Maybe this question belongs in a new thread,
but I do thank you for helping me to look in the correct place.

Best regards,
Rich

fremontwy=# \dF+ english
Text search configuration "pg_catalog.english"
Parser: "pg_catalog.default"
     Token      |       Dictionaries
-----------------+--------------------------
 asciihword      | english_stem
 asciiword       | rwg_synonym,english_stem
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | english_stem
 hword_asciipart | english_stem
 hword_numpart   | simple
 hword_part      | english_stem
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | english_stem

On Sun, Nov 27, 2011 at 7:29 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Richard,

you should check your mapping - '1st' belongs to 'numword' and may be
processed
in a different way than 'first' or '1'.

Oleg
On Sat, 26 Nov 2011, Richard Greenwood wrote:

I am working with street address data in which 'first st' has been
entered as '1 st' and so on. So I have created a text search
dictionary with entries:
   first  1
   1st  1
And initially it seems to be working properly:

SELECT ts_lexize('rwg_synonym','first');
ts_lexize
-----------
{1}

SELECT ts_lexize('rwg_synonym','1st');
ts_lexize
-----------
{1}

But my queries on '1st' are not returning the expected results:

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1');
count
-------
 403  <- this is what I want

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first');
count
-------
 403  <- this is also good

SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st');
count
-------
   4  <- this is not good. There are 4 records that do have '1st',
but why am I not getting 403 records?

Thanks for reading,
Rich

       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Richard Greenwood
richard.greenwood@gmail.com
www.greenwoodmap.com

--
Richard Greenwood
richard.greenwood@gmail.com
www.greenwoodmap.com