Email parsing in Text Search

Started by Martin Dubéover 9 years ago3 messagesbugs
Jump to latest
#1Martin Dubé
martin.dube@gmail.com

Hi,

I'm having a weird behavior with the email parser and wonder if it is a bug
or a feature.

When using the default regconfig and parse an email where the first part is
numbers only, it is not parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001@asdf.com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+------------------+-----------+--------------+------------+-------------
uint | Unsigned integer | 000000001 | {simple} | simple |
{000000001}
blank | Space symbols | @ | {} | |
host | Host | asdf.com | {simple} | simple | {
asdf.com}
(3 rows)

However, if I add a letter, it is parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001a@asdf.com');
alias | description | token | dictionaries | dictionary |
lexemes
-------+---------------+---------------------+--------------+------------+-----------------------
email | Email address | 000000001a@asdf.com | {simple} | simple | {
000000001a@asdf.com}
(1 row)

According to RFC and several forums, an email address with only numbers in
the first part is valid.

Is it a normal behavior?

I did the test on OpenBSD 5.9 and postgresql is at version 9.4.6.

Thanks,

--
Mart

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martin Dubé (#1)
Re: Email parsing in Text Search

=?UTF-8?B?TWFydGluIER1YsOp?= <martin.dube@gmail.com> writes:

When using the default regconfig and parse an email where the first part is
numbers only, it is not parsed as an email.

This has been changed for 9.6:

* Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Martin Dubé
martin.dube@gmail.com
In reply to: Tom Lane (#2)
Re: Email parsing in Text Search

I should have seen that! Thank you very much!

On Wed, Sep 7, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?B?TWFydGluIER1YsOp?= <martin.dube@gmail.com> writes:

When using the default regconfig and parse an email where the first part

is

numbers only, it is not parsed as an email.

This has been changed for 9.6:

* Fix the default text search parser to allow leading digits in
email and host tokens (Artur Zakirov)

regards, tom lane

--
Mart