Email parsing in Text Search
Hi,
I'm having a weird behavior with the email parser and wonder if it is a bug
or a feature.
When using the default regconfig and parse an email where the first part is
numbers only, it is not parsed as an email.
db=# select * from ts_debug('pg_catalog.english', '000000001@asdf.com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+------------------+-----------+--------------+------------+-------------
uint | Unsigned integer | 000000001 | {simple} | simple |
{000000001}
blank | Space symbols | @ | {} | |
host | Host | asdf.com | {simple} | simple | {
asdf.com}
(3 rows)
However, if I add a letter, it is parsed as an email.
db=# select * from ts_debug('pg_catalog.english', '000000001a@asdf.com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+---------------+---------------------+--------------+------------+-----------------------
email | Email address | 000000001a@asdf.com | {simple} | simple | {
000000001a@asdf.com}
(1 row)
According to RFC and several forums, an email address with only numbers in
the first part is valid.
Is it a normal behavior?
I did the test on OpenBSD 5.9 and postgresql is at version 9.4.6.
Thanks,
--
Mart
=?UTF-8?B?TWFydGluIER1YsOp?= <martin.dube@gmail.com> writes:
When using the default regconfig and parse an email where the first part is
numbers only, it is not parsed as an email.
This has been changed for 9.6:
* Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov)
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I should have seen that! Thank you very much!
On Wed, Sep 7, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
=?UTF-8?B?TWFydGluIER1YsOp?= <martin.dube@gmail.com> writes:
When using the default regconfig and parse an email where the first part
is
numbers only, it is not parsed as an email.
This has been changed for 9.6:
* Fix the default text search parser to allow leading digits in
email and host tokens (Artur Zakirov)regards, tom lane
--
Mart