email data type first release

Started by Gaetano Mendolaover 21 years ago6 messages
#1Gaetano Mendola
mendola@bigfoot.com

I already post this message using pgsql-hackers@postgresql.org
with attached a binary file but I guess the newsgroup not
accept binary file.

The following was almost the message, now the file is on pg_foundry (
now it's working ).

=====================================================================
Hi all, this is the alpha version for the new email data type,
is not written as built in type but as plug in, Tome Lane and others
drove me in this direction.

The type is indexable and provide also conversion methods:
text <--> email

It also defined the operator >>, is possible use it in select like:

select * from my_user where email >> 'hotmail.com';

this select will extract all records with an email inside
the domain 'hotmail.com'.

The validation routine is very simple, I focused my attention to made all
this working, right now test only the presence of character '@', it's an
alpha version after all), I'd like to put it on pgfoundry but apparently
my DNS are unable to resolve www.pgfoundry.org. I'll put this version as
soon the address become available.

This is the first time that I wrote code for postgresql so please, if
you can, do a sort of code revision on it.

Comments are welcomed.
=====================================================================

Regards
Gaetano Mendola.

#2Greg Stark
gsstark@mit.edu
In reply to: Gaetano Mendola (#1)
Re: email data type first release

Gaetano Mendola <mendola@bigfoot.com> writes:

Comments are welcomed.

Well as long as you're asking...

Email domains are case insensitive, but the left hand side is case sensitive.
That's the only part that's hard to handle using a text data type, it would be
kind of neat if the email operators got it right.

Another thing is that it might make more sense to sort email addresses by
domain first (case insensitively of course), then by left hand side (case
sensitively). Since the domain is really the "most significant bit". This is
also convenient for many systems like email since they perform better when
they can handle data in that order.

Note that this would make the type sort differently from its text
representation. This shouldn't really be a problem but occasionally you see
poorly written queries that introduce extra type conversions that the user
doesn't expect. But then if it behaves just like the text datatype then there
wouldn't be much point in using it.

--
greg

#3Tommi Maekitalo
t.maekitalo@epgmbh.de
In reply to: Greg Stark (#2)
Re: email data type first release

...

Another thing is that it might make more sense to sort email addresses by
domain first (case insensitively of course), then by left hand side (case
sensitively). Since the domain is really the "most significant bit". This
is also convenient for many systems like email since they perform better
when they can handle data in that order.

Note that this would make the type sort differently from its text
representation. This shouldn't really be a problem but occasionally you see
poorly written queries that introduce extra type conversions that the user
doesn't expect. But then if it behaves just like the text datatype then
there wouldn't be much point in using it.

Sorting should then be done by top-level-domain first. Then 2nd, 3rd... and
last by user.

b@a.domain.com < a@b.domain.com
and
b@b.a.com < b@a.b.com

we get then the order:
a@a.a.com < a@b.a.com < a@a.b.com < a@b.b.com

rather than (in normal text-order):
a@a.a.com < a@a.b.com < a@b.a.com < a@a.b.com

Tommi

#4Gaetano Mendola
mendola@bigfoot.com
In reply to: Greg Stark (#2)
Re: email data type first release

Greg Stark wrote:

Gaetano Mendola <mendola@bigfoot.com> writes:

Comments are welcomed.

Well as long as you're asking...

Email domains are case insensitive, but the left hand side is case sensitive.
That's the only part that's hard to handle using a text data type, it would be
kind of neat if the email operators got it right.

Another thing is that it might make more sense to sort email addresses by
domain first (case insensitively of course), then by left hand side (case
sensitively). Since the domain is really the "most significant bit". This is
also convenient for many systems like email since they perform better when
they can handle data in that order.

Note that this would make the type sort differently from its text
representation. This shouldn't really be a problem but occasionally you see
poorly written queries that introduce extra type conversions that the user
doesn't expect. But then if it behaves just like the text datatype then there
wouldn't be much point in using it.

That's true, I will order as Tommi Maekitalo suggest.

Regards
Gaetano Mendola

#5Bruno Wolff III
bruno@wolff.to
In reply to: Gaetano Mendola (#4)
Re: email data type first release

On Mon, May 17, 2004 at 17:11:42 +0200,
Gaetano Mendola <mendola@bigfoot.com> wrote:

That's true, I will order as Tommi Maekitalo suggest.

And how do domain literals fit into this?

bruno@[66.93.249.74] is a valid email address for me. (At least as
long as my server is at that IP address.)

#6Greg Stark
gsstark@mit.edu
In reply to: Tommi Maekitalo (#3)
Re: email data type first release

Tommi Maekitalo <t.maekitalo@epgmbh.de> writes:

Sorting should then be done by top-level-domain first. Then 2nd, 3rd... and
last by user.

I thought of that but decided not to suggest it:

a) as far as email goes there's no relationship between xxx@foo.com and
xxx@bar.com. The ".com" doesn't mean the emails are any more related than
xxx@foo.com and xxx@foo.org are. In fact in practice the latter two are
more likely to be related.

b) it's a lot of extra work, whereas sorting by domain first is just as easy
as sorting by lhs first.

--
greg