ERROR: syntax error in tsquery - for high-unicode whitespace

Started by hubert depesz lubaczewskiabout 13 years ago3 messagesbugs
Jump to latest

hi
it was tested on 9.1 and 9.3. Interestingly - it worked without error in
8.2.

$ select to_tsquery('english', E'a\xe2\x80\x86a');
ERROR: syntax error in tsquery: "a a"

the 3-byte utf8 character is SIX-PER-EM SPACE (based on info from
http://www.fileformat.info/info/unicode/char/2006/index.htm)

Not sure what should happen with it, but generally I thought that
whitespace characters will get ignored (treated as separators) when
building tsquery.

It seems to work that way when building tsvector though:

$ select to_tsvector('english', E'a\xe2\x80\x86a');
to_tsvector
-------------

(1 row)

and for larger example:

$ select to_tsvector('english', E'depesz\xe2\x80\x86whatever');
to_tsvector
-----------------------
'depesz':1 'whatev':2
(1 row)

$ select to_tsquery('english', E'depesz\xe2\x80\x86whatever');
ERROR: syntax error in tsquery: "depesz whatever"

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: hubert depesz lubaczewski (#1)
Re: ERROR: syntax error in tsquery - for high-unicode whitespace

hubert depesz lubaczewski <depesz@depesz.com> writes:

$ select to_tsquery('english', E'a\xe2\x80\x86a');
ERROR: syntax error in tsquery: "a a"

the 3-byte utf8 character is SIX-PER-EM SPACE (based on info from
http://www.fileformat.info/info/unicode/char/2006/index.htm)

AFAICS, that behavior is correct, if you're using a locale that reports
  as being whitespace. Compare

u8e=# select to_tsquery('english', E'a a');
ERROR: syntax error in tsquery: "a a"

You need an ampersand or something in there.
Or use plainto_tsquery().

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#2)
Re: ERROR: syntax error in tsquery - for high-unicode whitespace

On Thu, Mar 14, 2013 at 11:56:19PM -0400, Tom Lane wrote:

hubert depesz lubaczewski <depesz@depesz.com> writes:

$ select to_tsquery('english', E'a\xe2\x80\x86a');
ERROR: syntax error in tsquery: "a a"

the 3-byte utf8 character is SIX-PER-EM SPACE (based on info from
http://www.fileformat.info/info/unicode/char/2006/index.htm)

AFAICS, that behavior is correct, if you're using a locale that reports
  as being whitespace. Compare

u8e=# select to_tsquery('english', E'a a');
ERROR: syntax error in tsquery: "a a"

You need an ampersand or something in there.
Or use plainto_tsquery().

Right. Thanks. Not sure how I missed that.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs