User defined types -- Social Security number...
Anyone have a good pre-built user-defined type definition for creating /
maintaining / manipulating a SSN ... where valid chars are in the range
000-00-0000 through 999-99-9999.
I imagine that the storage column is probably varchar(11) -- I am looking
for a type definition that
1) checks that all numbers are in the range [0..9]
2) inserts the dashes at position
3) accepts either '000123456' OR '000-12-3456' as the input
4) leading zeroes (as in the above) must be stored -- hence varchar(11)
instead of numeric
5) always outputs in thje format '000-12-3456'
Either that or the question is: How can I coerce postgreSQL into using an
input / output "mask"...
--
Greg Patnude / The Digital Demention
2916 East Upper Hayden Lake Road
Hayden Lake, ID 83835
(208) 762-0762
Greg,
Anyone have a good pre-built user-defined type definition for creating /
maintaining / manipulating a SSN ... where valid chars are in the range
000-00-0000 through 999-99-9999.
Actually, the range is more narrowly defined than that. I'm not sure of the
exact rules, but you will never see a leading 0 or a -00- in an SSN.
I imagine that the storage column is probably varchar(11) -- I am looking
for a type definition that
Use DOMAINs, not a custom type. It's less work.
Either that or the question is: How can I coerce postgreSQL into using an
input / output "mask"...
After you've created your DOMAIN, based on the TEXT type, you can overload the
input and output functions to format correctly. Beware, though: input &
output functions pretty much have to be written in C.
--
Josh Berkus
Aglio Database Solutions
San Francisco
Thanks Josh -- I understand that there are valid and invalid SSN's --
similar rules apply to zip codes and area codes...
I tried this:
SELECT to_char(123456789, '000-00-0000');
which yields 123-45-6789 -- nicely, I might add...
the trick is getting postgreSQL to do this without having to create an ON
SELECT and ON UPDATE TRIGGER...
an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick...
SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx" --
I do agree that there are valid ranges -- my main concern is being able to
store any leading zeros - I just need to make sure that something "looks"
like a valid SSN in the formattig
(nnn-nn-nnnn) and that I can store / retrieve it with the approoriate
format -- what I am really trying to accomplish is an "input mask"...
I hadn't considered using a Domain.... have to look at that....
--
Greg Patnude / The Digital Demention
2916 East Upper Hayden Lake Road
Hayden Lake, ID 83835
(208) 762-0762
"Josh Berkus" <josh@agliodbs.com> wrote in message
news:200402221132.50311.josh@agliodbs.com...
Greg,
Anyone have a good pre-built user-defined type definition for creating /
maintaining / manipulating a SSN ... where valid chars are in the range
000-00-0000 through 999-99-9999.Actually, the range is more narrowly defined than that. I'm not sure of
the
exact rules, but you will never see a leading 0 or a -00- in an SSN.
I imagine that the storage column is probably varchar(11) -- I am
looking
for a type definition that
Use DOMAINs, not a custom type. It's less work.
Either that or the question is: How can I coerce postgreSQL into using
an
input / output "mask"...
After you've created your DOMAIN, based on the TEXT type, you can overload
the
Show quoted text
input and output functions to format correctly. Beware, though: input &
output functions pretty much have to be written in C.--
Josh Berkus
Aglio Database Solutions
San Francisco---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
In the last exciting episode, "Greg Patnude" <gpatnude@hotmail.com> wrote:
Thanks Josh -- I understand that there are valid and invalid SSN's --
similar rules apply to zip codes and area codes...I tried this:
SELECT to_char(123456789, '000-00-0000');
which yields 123-45-6789 -- nicely, I might add...the trick is getting postgreSQL to do this without having to create an ON
SELECT and ON UPDATE TRIGGER...an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick...
SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx" --
I do agree that there are valid ranges -- my main concern is being able to
store any leading zeros - I just need to make sure that something "looks"
like a valid SSN in the formattig
(nnn-nn-nnnn) and that I can store / retrieve it with the approoriate
format -- what I am really trying to accomplish is an "input mask"...I hadn't considered using a Domain.... have to look at that....
Strongly recommended; that allows applying the validation in many
places without having to repeat validation "code."
If you will be using really a lot of these values, and indexing on
them, it even may be worth looking at a custom type.
A performance "win" would come in using a compact data type. For
instance, for 9 digit national ID numbers, you can do a LOT better
than an 11 byte string. (Aside: Anything bigger than 34 bits would
do, demonstrating that it is a regrettable loss that 36 bit computer
systems went the way of the dodo...)
--
output = reverse("gro.gultn" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/internet.html
I found out why cats drink out of the toilet. My mother told me it's
because it's cold in there. And I'm like: How did my mother know THAT?
--Wendy Liebman
Josh Berkus wrote:
Greg,
Anyone have a good pre-built user-defined type definition for creating /
maintaining / manipulating a SSN ... where valid chars are in the range
000-00-0000 through 999-99-9999.Actually, the range is more narrowly defined than that. I'm not sure of the
exact rules, but you will never see a leading 0 or a -00- in an SSN.
Actually I know someone who's SSN starts as 003.
Agreed on the DOMAIN suggestion.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sun, Feb 22, 2004 at 04:45:51PM -0800, Greg Patnude wrote:
Thanks Josh -- I understand that there are valid and invalid SSN's --
similar rules apply to zip codes and area codes...I tried this:
SELECT to_char(123456789, '000-00-0000');
which yields 123-45-6789 -- nicely, I might add...the trick is getting postgreSQL to do this without having to create an
ON
SELECT and ON UPDATE TRIGGER...an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick...
SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx"
--I do agree that there are valid ranges -- my main concern is being
able to
store any leading zeros - I just need to make sure that something
"looks"
like a valid SSN in the formattig
(nnn-nn-nnnn) and that I can store / retrieve it with the approoriate
format -- what I am really trying to accomplish is an "input mask"...
Look, you're thinking way too hard on this. An SSN is a 9-digit number,
nothing more. There are some 9-digit numbers which aren't valid SSN's,
and you might want to get fancy and create a constraint for that.
Regardless, you are making a *major* mistake of confusing data storage
with
rendering. It is common to *render* an SSN as xxx-xx-xxxx and its
cousin the FETID (Federal Employers Tax ID) as xx-xxxxxxx. To store the
dashes makes no sense. They're in the same place each time, it's wasted
data.
Store the SSN as an "integer". When you begin to think about this
correctly, the "leading zeros" problem disappears since that is also a
*rendering* issue.
When you pull the data out, either fix it up in your programming
language to the format that you wish, or use the to_char function as
shown above in your select statements.
To help you think about this whole issue, consider the timestamp
datatype. Timestamps are stored as a Julian date internally. I suspect
that they use a double-floating point as the actual format, but
regardless the point is that it's a number. Rather than storing
2004-02-29 21:14:27.030434-06
We store:
2453065.88503472
It's easier to use that as a basic format from which we can render it in
any way we wish.
The same applies to your SSN.
Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/
The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote:
Look, you're thinking way too hard on this. An SSN is a 9-digit number,
nothing more. There are some 9-digit numbers which aren't valid SSN's,
and you might want to get fancy and create a constraint for that.Regardless, you are making a *major* mistake of confusing data
storage with rendering. It is common to *render* an SSN as
xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as
xx-xxxxxxx. To store the dashes makes no sense. They're in the
same place each time, it's wasted data.Store the SSN as an "integer". When you begin to think about this
correctly, the "leading zeros" problem disappears since that is also a
*rendering* issue.
Well put.
The one thing that is a bit unfortunate is that 32 bit ints aren't
quite big enough for this. You need 1 extra digit :-(.
When you pull the data out, either fix it up in your programming
language to the format that you wish, or use the to_char function as
shown above in your select statements.
Using a view to hide the "physical" representation is also an idea.
A full scale type definition could make for an even more efficient
approach that makes the implementation appear invisible.
To help you think about this whole issue, consider the timestamp
datatype. Timestamps are stored as a Julian date internally. I
suspect that they use a double-floating point as the actual format,
but regardless the point is that it's a number. Rather than storing
Actually, it's an "int64"; a 64 bit integer, on platforms that support
that type. It's a "double" only on platforms that do not support that
type.
It's easier to use that as a basic format from which we can render
it in any way we wish.
Indeed.
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://cbbrowne.com/info/spiritual.html
"I owe the government $3400 in taxes. So I sent them two hammers and
a toilet seat." -- Michael McShane
I would represent an SSN as numeric(9,0).
an int 32 would work though.
2**31 is > 999999999
On Sun, 29 Feb 2004, Christopher Browne wrote:
Show quoted text
The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote:
Look, you're thinking way too hard on this. An SSN is a 9-digit number,
nothing more. There are some 9-digit numbers which aren't valid SSN's,
and you might want to get fancy and create a constraint for that.Regardless, you are making a *major* mistake of confusing data
storage with rendering. It is common to *render* an SSN as
xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as
xx-xxxxxxx. To store the dashes makes no sense. They're in the
same place each time, it's wasted data.Store the SSN as an "integer". When you begin to think about this
correctly, the "leading zeros" problem disappears since that is also a
*rendering* issue.Well put.
The one thing that is a bit unfortunate is that 32 bit ints aren't
quite big enough for this. You need 1 extra digit :-(.When you pull the data out, either fix it up in your programming
language to the format that you wish, or use the to_char function as
shown above in your select statements.Using a view to hide the "physical" representation is also an idea.
A full scale type definition could make for an even more efficient
approach that makes the implementation appear invisible.To help you think about this whole issue, consider the timestamp
datatype. Timestamps are stored as a Julian date internally. I
suspect that they use a double-floating point as the actual format,
but regardless the point is that it's a number. Rather than storingActually, it's an "int64"; a 64 bit integer, on platforms that support
that type. It's a "double" only on platforms that do not support that
type.It's easier to use that as a basic format from which we can render
it in any way we wish.Indeed.
On Monday 01 March 2004 8:54 am, Dana Hudes wrote:
I would represent an SSN as numeric(9,0).
an int 32 would work though.
2**31 is > 999999999On Sun, 29 Feb 2004, Christopher Browne wrote:
The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney)
wrote:
Look, you're thinking way too hard on this. An SSN is a
9-digit number, nothing more. There are some 9-digit numbers
which aren't valid SSN's, and you might want to get fancy and
create a constraint for that.Regardless, you are making a *major* mistake of confusing data
storage with rendering. It is common to *render* an SSN as
xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID)
as xx-xxxxxxx. To store the dashes makes no sense. They're in
the same place each time, it's wasted data.
I missed the start of this thread but will chime in with a comment
anyway.
My rule is to select an appropriate numeric type of data if you will
be doing numeric types of things to it, character types if you will
be doing character manipulations, etc.
I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN),
SSN+7.86 but there are plenty of good reasons to need the first three
characters (the "area number"), the middle two characters (the "group
number", and the last 4 characters (the "serial number", often
(ab)used as a password for banking and other purposes).
While the dashes certainly don't need to be stored, they are not in
arbitrary positions as they delimit the parts of the SSN noted above.
One might even want to store the SSN in three columns, the first
linked to a table of valid area numbers, the second error checked so
"00" is not valid and so on or get even more fancy and error check
against: http://www.ssa.gov/employer/highgroup.txt. It all depends on
one's specific requirements.
Google and you will find SSN info pages such as:
http://proagency.tripod.com/usasssearch.html
http://www.ssa.gov/foia/stateweb.html
Cheers,
Steve
On Sun, Feb 29, 2004 at 11:11:31PM -0500, Christopher Browne wrote:
The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote:
Look, you're thinking way too hard on this. An SSN is a 9-digit number,
nothing more. There are some 9-digit numbers which aren't valid SSN's,
and you might want to get fancy and create a constraint for that.Regardless, you are making a *major* mistake of confusing data
storage with rendering. It is common to *render* an SSN as
xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as
xx-xxxxxxx. To store the dashes makes no sense. They're in the
same place each time, it's wasted data.Store the SSN as an "integer". When you begin to think about this
correctly, the "leading zeros" problem disappears since that is also a
*rendering* issue.Well put.
The one thing that is a bit unfortunate is that 32 bit ints aren't
quite big enough for this. You need 1 extra digit :-(.
For what? The largest SSN is 999,999,999, a signed 32-bit int goes to
just over 2,000,000,000. Ever hear of a "4GB limit"?
Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/
On Mon, Mar 01, 2004 at 09:42:48AM -0800, Steve Crawford wrote:
I missed the start of this thread but will chime in with a comment
anyway.My rule is to select an appropriate numeric type of data if you will
be doing numeric types of things to it, character types if you will
be doing character manipulations, etc.I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN),
SSN+7.86 but there are plenty of good reasons to need the first three
characters (the "area number"), the middle two characters (the "group
number", and the last 4 characters (the "serial number", often
(ab)used as a password for banking and other purposes).
Another excellent point. I often store zip codes as text for this
reason.
The only other thing that I would mention is that if the SSN field in
the db will be a key of some sort, which is often the case, then it
might be more efficient to store it as an integer. It might be more
efficient to store it as a character string. The author should test
in this case to determine the most efficient way.
As for character vs. integer manipulations, in most scripting style
languages, which is pretty much exlusively what I use, there's no
need to think about types, and something like an SSN will silently
change between being character or integer depending on what operations
are being performed on it.
Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/
"Michael Chaney" <mdchaney@michaelchaney.com> wrote in message
news:20040301211843.GB19105@michaelchaney.com...
On Mon, Mar 01, 2004 at 09:42:48AM -0800, Steve Crawford wrote:
I missed the start of this thread but will chime in with a comment
anyway.My rule is to select an appropriate numeric type of data if you will
be doing numeric types of things to it, character types if you will
be doing character manipulations, etc.I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN),
SSN+7.86 but there are plenty of good reasons to need the first three
characters (the "area number"), the middle two characters (the "group
number", and the last 4 characters (the "serial number", often
(ab)used as a password for banking and other purposes).Another excellent point. I often store zip codes as text for this
reason.The only other thing that I would mention is that if the SSN field in
the db will be a key of some sort, which is often the case, then it
might be more efficient to store it as an integer. It might be more
efficient to store it as a character string. The author should test
in this case to determine the most efficient way.As for character vs. integer manipulations, in most scripting style
languages, which is pretty much exlusively what I use, there's no
need to think about types, and something like an SSN will silently
change between being character or integer depending on what operations
are being performed on it.Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Ther are some other points I'd like to make --
If I store the SSN as an integer -- theoretically -- leading zeroes will be
stripped (041-99-9999) -- my OWN ssn is a perfect example of this as it
starts with a leading zero...
This would cause a problem in that one of the requirements of an SSN is that
the length be exactly 9 digits or 9 chars WITHOUT the dashes.... so a CHECK
CONSTRAINT would be useful...
But if the SSN is stored as an integer -- there is no check constraint that
wouldn't fail for SSNs that start with one or more zeroes.... So I thought
how about a varchar(9) field and insert/update triggers that do the
formatting (adding the dashes on insert/update --) and validate the check
contraints (9 chars + the dashes)...
The two extra characters making a varchar(11) field are not a concern in the
normalization or schema... I simply wanted a formatting function so that I
dont have to do it in my scripting language or use the same CAST over and
over and over in my select/insert/update statements....
I am mainly looking to do the formatting automatically rather than having to
constantly format such a simple piece of data...
It would be really sweet in postgreSQL if we could apply the equivalent of a
printf(columnname) to the table definition -- MS Access has what they call
an "input mask" and it comes in really handy -- however -- I havent used
Access for anthing serious for about 4 years...
--
Greg Patnude / The Digital Demention
2916 East Upper Hayden Lake Road
Hayden Lake, ID 83835
(208) 762-0762
"Greg Patnude" <gpatnude@hotmail.com> writes:
It would be really sweet in postgreSQL if we could apply the
equivalent of a printf(columnname) to the table definition -- MS
Access has what they call an "input mask" and it comes in really
handy -- however -- I havent used Access for anthing serious for
about 4 years...
Hmm.
% man "create type"
NAME
CREATE TYPE - define a new data type
SYNOPSIS
CREATE TYPE name AS
( attribute_name data_type [, ... ] )
CREATE TYPE name (
INPUT = input_function,
OUTPUT = output_function
[ , RECEIVE = receive_function ]
[ , SEND = send_function ]
[ , INTERNALLENGTH = { internallength | VARIABLE } ]
[ , PASSEDBYVALUE ]
[ , ALIGNMENT = alignment ]
[ , STORAGE = storage ]
[ , DEFAULT = default ]
[ , ELEMENT = element ]
[ , DELIMITER = delimiter ]
)
"input_function" and "output_function" sure _sound_ like what you're
looking for.
They would make your imagined issue of "missing leading zeros" go
away.
I think that CREATE TYPE has been around for a goodly few years now;
it is not impossible that this functionality has been around longer
than MS Access (which is saying something!).
--
output = reverse("gro.mca" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/x.html
Rules of the Evil Overlord #116. "If I capture the hero's starship, I
will keep it in the landing bay with the ramp down, only a few token
guards on duty and a ton of explosives set to go off as soon as it
clears the blast-range." <http://www.eviloverlord.com/>
"Greg Patnude" <gpatnude@hotmail.com> wrote in message
news:c22ceg$s1a$1@news.hub.org...
"Michael Chaney" <mdchaney@michaelchaney.com> wrote in message
news:20040301211843.GB19105@michaelchaney.com...On Mon, Mar 01, 2004 at 09:42:48AM -0800, Steve Crawford wrote:
I missed the start of this thread but will chime in with a comment
anyway.My rule is to select an appropriate numeric type of data if you will
be doing numeric types of things to it, character types if you will
be doing character manipulations, etc.I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN),
SSN+7.86 but there are plenty of good reasons to need the first three
characters (the "area number"), the middle two characters (the "group
number", and the last 4 characters (the "serial number", often
(ab)used as a password for banking and other purposes).Another excellent point. I often store zip codes as text for this
reason.The only other thing that I would mention is that if the SSN field in
the db will be a key of some sort, which is often the case, then it
might be more efficient to store it as an integer. It might be more
efficient to store it as a character string. The author should test
in this case to determine the most efficient way.As for character vs. integer manipulations, in most scripting style
languages, which is pretty much exlusively what I use, there's no
need to think about types, and something like an SSN will silently
change between being character or integer depending on what operations
are being performed on it.Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match
Ther are some other points I'd like to make --
If I store the SSN as an integer -- theoretically -- leading zeroes will
be
stripped (041-99-9999) -- my OWN ssn is a perfect example of this as it
starts with a leading zero...This would cause a problem in that one of the requirements of an SSN is
that
the length be exactly 9 digits or 9 chars WITHOUT the dashes.... so a
CHECK
CONSTRAINT would be useful...
But if the SSN is stored as an integer -- there is no check constraint
that
wouldn't fail for SSNs that start with one or more zeroes.... So I thought
how about a varchar(9) field and insert/update triggers that do the
formatting (adding the dashes on insert/update --) and validate the check
contraints (9 chars + the dashes)...The two extra characters making a varchar(11) field are not a concern in
the
normalization or schema... I simply wanted a formatting function so that I
dont have to do it in my scripting language or use the same CAST over and
over and over in my select/insert/update statements....I am mainly looking to do the formatting automatically rather than having
to
constantly format such a simple piece of data...
It would be really sweet in postgreSQL if we could apply the equivalent of
a
printf(columnname) to the table definition -- MS Access has what they call
an "input mask" and it comes in really handy -- however -- I havent used
Access for anthing serious for about 4 years...--
Greg Patnude / The Digital Demention
2916 East Upper Hayden Lake Road
Hayden Lake, ID 83835
(208) 762-0762
You might want to look at CREATE DOMAIN
e.g. (for ISBNs, we want to check the format, and the check digit - replace
with suitable regex and validation function for social security numbers)
CREATE OR REPLACE FUNCTION utilities."validate_ISBN_check_digit"(char(10))
RETURNS boolean AS '
DECLARE
isbn_sum int:=0;
BEGIN
IF ($1 ~ ''^[0-9]{9}[0-9Xx]{1}$''::text) THEN
FOR i IN 1..10 LOOP
isbn_sum:= CASE
WHEN substring($1 from i for 1) IN (''X'',''x'') AND
i=10 THEN isbn_sum + (11-i * 10)
ELSE isbn_sum + (11-i * substring($1 from i for
1)::int)
END;
END LOOP;
IF mod(isbn_sum,11) = 0 THEN
RETURN ''t'';
END IF;
END IF;
RETURN ''f'';
END;
' LANGUAGE 'plpgsql';
COMMENT ON FUNCTION utilities."validate_ISBN_check_digit"(char(10)) is
'validation function for ISBN check digits';
CREATE DOMAIN utilities."ISBN" AS char(10) CONSTRAINT "ISBN format" CHECK
(VALUE ~ '^[0-9]{9}[0-9Xx]{1}$'::text) CONSTRAINT "ISBN checkdigit" CHECK
(utilities."validate_ISBN_check_digit"(VALUE));
Hope that helps
--
Tom Hebbron
www.hebbron.com