text search and "filenames"
Hi,
I noticed that the default parser does not recognize Windows-style
filenames:
alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\archivos');
alias | description | token
-----------+-----------------+----------
asciiword | Word, all ASCII | c
blank | Space symbols | :\
asciiword | Word, all ASCII | archivos
(3 lignes)
I played with it a bit (see attached patch -- basically I added \ in all
places where a / was being parsed, in the file-path states) and managed
to have it parse some naive versions, like
alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\archivos\\foo');
alias | description | token
-------+-------------------+-----------------
file | File or path name | c:\archivos\foo
(1 ligne)
However it fails as soon as you have a space, which is quite common on
Windows, for example
alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\Program Files\\');
alias | description | token
-----------+-------------------+------------
file | File or path name | c:\Program
blank | Space symbols |
asciiword | Word, all ASCII | Files
blank | Space symbols | \
(4 lignes)
It also fails to recognize "network" file names, like
alvherre=# SELECT alias, description, token FROM ts_debug(e'\\\\server\\archivos\\foo');
alias | description | token
-----------+-----------------+----------
blank | Space symbols | \\
asciiword | Word, all ASCII | server
blank | Space symbols | \
asciiword | Word, all ASCII | archivos
blank | Space symbols | \
asciiword | Word, all ASCII | foo
(6 lignes)
Is this something worth worrying about?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Attachments:
tsearch-win-files.patchtext/x-diff; charset=us-asciiDownload+7-0
Alvaro Herrera <alvherre@commandprompt.com> writes:
I noticed that the default parser does not recognize Windows-style
filenames:
Is this something worth worrying about?
I'm not too excited about it. The fact that there's a filename category
at all seems a bit of a wart to me, particularly since simple examples
like 'example.txt' don't get parsed that way. I definitely don't see
any good way to allow spaces in Windows filenames...
regards, tom lane