primary key and existing unique fields
Hi all,
I am wandering about the pros and cons of creating a separate serial field
for a primary key when I already have a single unique field. This existing
unique field will have to be a character of fixed length (VARCHAR(12))
because although it's a numeric value there will be leading zeroes. There
are a couple more tables with similar unique fields and one of them would
need to reference the others. Does anybody see any good reason for adding a
separate autoincrement primary key field for each table? or either way is
not a big deal.
Sally
_________________________________________________________________
Don�t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
Since you already have the unique field I see no point in adding a sequence
to the table, unless of course the sequence of the data inserts is of
importance at some point.
Duane
-----Original Message-----
From: Sally Sally [mailto:dedeb17@hotmail.com]
Sent: Tuesday, October 26, 2004 9:25 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] primary key and existing unique fields
Hi all,
I am wandering about the pros and cons of creating a separate serial field
for a primary key when I already have a single unique field. This existing
unique field will have to be a character of fixed length (VARCHAR(12))
because although it's a numeric value there will be leading zeroes. There
are a couple more tables with similar unique fields and one of them would
need to reference the others. Does anybody see any good reason for adding a
separate autoincrement primary key field for each table? or either way is
not a big deal.
Sally
_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Import Notes
Resolved by subject fallback
Sally Sally wrote:
Hi all,
I am wandering about the pros and cons of creating a separate serial
field for a primary key when I already have a single unique field. This
existing unique field will have to be a character of fixed length
(VARCHAR(12)) because although it's a numeric value there will be
leading zeroes. There are a couple more tables with similar unique
fields and one of them would need to reference the others. Does anybody
see any good reason for adding a separate autoincrement primary key
field for each table? or either way is not a big deal.
Your primary key should not be directly related to the data being
stored. Outside of the fact that it is the primary reference or the row.
Sincerely,
Joshua D. Drake
Sally
_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Can you please elaborate on the point you just made as to why the primary
key should not relate to the data (even for a case when there is an existing
unique field that can be used to identify the record)
From: "Joshua D. Drake" <jd@commandprompt.com>
To: Sally Sally <dedeb17@hotmail.com>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] primary key and existing unique fields
Date: Tue, 26 Oct 2004 09:48:50 -0700Sally Sally wrote:
Hi all,
I am wandering about the pros and cons of creating a separate serial field
for a primary key when I already have a single unique field. This existing
unique field will have to be a character of fixed length (VARCHAR(12))
because although it's a numeric value there will be leading zeroes. There
are a couple more tables with similar unique fields and one of them would
need to reference the others. Does anybody see any good reason for adding
a separate autoincrement primary key field for each table? or either way
is not a big deal.Your primary key should not be directly related to the data being stored.
Outside of the fact that it is the primary reference or the row.Sincerely,
Joshua D. Drake
Sally
_________________________________________________________________
Dont just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend-- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL << jd.vcf >>---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
_________________________________________________________________
Check out Election 2004 for up-to-date election news, plus voter tools and
more! http://special.msn.com/msn/election2004.armx
Import Notes
Resolved by subject fallback
Sally Sally wrote:
Can you please elaborate on the point you just made as to why the
primary key should not relate to the data (even for a case when there is
an existing unique field that can be used to identify the record)
Here is a good article on the topic:
http://www.devx.com/ibm/Article/20702
Sincerely,
Joshua D. Drake
--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Joshua D. Drake wrote:
Sally Sally wrote:
Can you please elaborate on the point you just made as to why the
primary key should not relate to the data (even for a case when there
is an existing unique field that can be used to identify the record)Here is a good article on the topic:
That article makes me want to vomit uncontrollably! ;-)
"Business data might also simply be bad -- glitches in the Social
Security Administration's system may lead to different persons having
the same Social Security Number. A surrogate key helps to isolate the
system from such problems."
The surrogate key isn't solving the underlying logical inconsistency
problem. It is being used as a work-around to cover one up. I suspect
the author of being a MySQL user.
Mike Mascari
Here is a good article on the topic:
The surrogate key isn't solving the underlying logical inconsistency
problem. It is being used as a work-around to cover one up. I suspect
the author of being a MySQL user.
Actually he is a software project consultant for IEEE.org and he holds
a Ph.D. in Theoretical Physics.
Sincerely,
Joshua D. Drake
Mike Mascari
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Joshua D. Drake wrote:
Here is a good article on the topic:
The surrogate key isn't solving the underlying logical inconsistency
problem. It is being used as a work-around to cover one up. I suspect
the author of being a MySQL user.Actually he is a software project consultant for IEEE.org and he holds
a Ph.D. in Theoretical Physics.
<joking>
Apparently gamma functions and string theory have little to do with
understanding the relational model of data.
</joking>
Seriously, my only point was that Date & Darwen and other relational
purists do not use surrogate keys. The surrogate-key vs. speaking-key
debate devolves quickly. I'd only argue that it is wrong as the author
of the article implied that the speaking-key side of the debate is
without merit.
He begins:
"For the purpose of data modeling, the plumbing should be largely
transparent. In fact, purist DB lore makes no distinction between data
and plumbing. However, you will see that it is more efficient for
administration and maintenance, as well as in terms of runtime
performance, to have some additional fields to serve as DB keys."
So he dismisses the speaking-key argument in one sentence as "purist DB
lore." He then proceeds with a poor example:
"The requirements for a primary key are very strict. It must:
Exist
Be unique
Not change over time
Surrogate keys help to mitigate the fact that real business data never
reliably fulfills these requirements. Not every person has a Social
Security Number (think of those outside the U.S.), people change their
names, and other important information."
1. The reason we have ON UPDATE CASCADE is to handle changes in primary
keys.
2. If not everyone has a social security number than the design should
be sufficiently normalized to reflect that fact.
I am not saying that Chris Date and Hugh Darwen are right and that
Philipp Janert is wrong. I am only saying that both sides should be
investigated and judged on the weight of their arguments.
Personally, I've found over time that when I deviate from "purist DB
lore" I get punished in long run.
Could be wrong, though. :-)
Mike Mascari
On Tue, 26 Oct 2004 16:24:44 +0000, Sally Sally <dedeb17@hotmail.com> wrote:
Hi all,
I am wandering about the pros and cons of creating a separate serial field
for a primary key when I already have a single unique field. This existing
unique field will have to be a character of fixed length (VARCHAR(12))
because although it's a numeric value there will be leading zeroes. There
are a couple more tables with similar unique fields and one of them would
need to reference the others. Does anybody see any good reason for adding a
separate autoincrement primary key field for each table? or either way is
not a big deal.
I see three possible advantages:
1. having varchar(12) in every referencing table, takes more storage
space, which may mean something if you have tons of gigabytes of rows.
;)
2. if any of your varchar(12) row's data is likely to change in
future, you'll make update of one table, not an update which will
CASCADE over many tables.
3. If you'll need things like "last 50 keys", you can SELECT * FROM
foo ORDER BY yourserialkey DESC LIMIT 50;
And a disadvantage:
if you'll need to access the data by your varchar(12) key, you'll need
to perform JOIN on two tables. If you used varchar(12) as your key,
you don't. :)
Regards,
Dawid
Look at the database design in terms of data retrieval. If I add a sequence
number as my primary key, when I get ready to retrieve that record
"directly" how do I know what that sequence number is. For instance, my
employee number is 123456789, and it is unique within my company and my
sequence number is 375. I will more likely know the employee number to
query than I will the sequence number. Sure saves time in data access. I
know, you can always create a unique index on the employee number as well as
a primary index on the sequence number but WHY would I want to take up room
for a field in the record as well as the useless index space for no purpose.
Codd said "the key, the whole key and nothing but the key." In database
design we cannot always do this for query performance but why add something
to a record that will serve no real purpose. Keys are updateable, hopefully
they are not changed all that often but the ability should be there. If
they are not updateable then the database engine is not one I would choose
for my application.
Duane
-----Original Message-----
From: Mike Mascari [mailto:mascarm@mascari.com]
Sent: Tuesday, October 26, 2004 2:26 PM
To: Joshua D. Drake
Cc: Sally Sally; pgsql-general@postgresql.org
Subject: Re: [GENERAL] primary key and existing unique fields
Joshua D. Drake wrote:
Here is a good article on the topic:
The surrogate key isn't solving the underlying logical inconsistency
problem. It is being used as a work-around to cover one up. I suspect
the author of being a MySQL user.Actually he is a software project consultant for IEEE.org and he holds
a Ph.D. in Theoretical Physics.
<joking>
Apparently gamma functions and string theory have little to do with
understanding the relational model of data.
</joking>
Seriously, my only point was that Date & Darwen and other relational
purists do not use surrogate keys. The surrogate-key vs. speaking-key
debate devolves quickly. I'd only argue that it is wrong as the author
of the article implied that the speaking-key side of the debate is
without merit.
He begins:
"For the purpose of data modeling, the plumbing should be largely
transparent. In fact, purist DB lore makes no distinction between data
and plumbing. However, you will see that it is more efficient for
administration and maintenance, as well as in terms of runtime
performance, to have some additional fields to serve as DB keys."
So he dismisses the speaking-key argument in one sentence as "purist DB
lore." He then proceeds with a poor example:
"The requirements for a primary key are very strict. It must:
Exist
Be unique
Not change over time
Surrogate keys help to mitigate the fact that real business data never
reliably fulfills these requirements. Not every person has a Social
Security Number (think of those outside the U.S.), people change their
names, and other important information."
1. The reason we have ON UPDATE CASCADE is to handle changes in primary
keys.
2. If not everyone has a social security number than the design should
be sufficiently normalized to reflect that fact.
I am not saying that Chris Date and Hugh Darwen are right and that
Philipp Janert is wrong. I am only saying that both sides should be
investigated and judged on the weight of their arguments.
Personally, I've found over time that when I deviate from "purist DB
lore" I get punished in long run.
Could be wrong, though. :-)
Mike Mascari
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Import Notes
Resolved by subject fallback
That article makes me want to vomit uncontrollably! ;-)
"Business data might also simply be bad -- glitches in the Social
Security Administration's system may lead to different persons having
the same Social Security Number. A surrogate key helps to isolate the
system from such problems."The surrogate key isn't solving the underlying logical inconsistency
problem. It is being used as a work-around to cover one up. I suspect
the author of being a MySQL user.
I think what he's saying is that an application bug, or a business
process problem, should not interfere with your database system.
Granted, two identical SSNs seems far fetched. However, if your business
screws up and you need to change someone's primary key, you've just
violated the principle of a primary key. You better be REALLY sure the
primary key will NEVER change over time for a given record, and that it
really is unique.
An SSN might fit that description, but there are always strange
situations. What if someone sues to have their SSN changed and a judge
orders it? If that's their PK, the social security administration is up
a creek (at least in the DB theory world, it probably wouldn't matter
much in practice).
Regards,
Jeff Davis
On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
<joking>
Apparently gamma functions and string theory have little to do with
understanding the relational model of data.</joking>
mmmmm.. string theory. :-)
--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON | www.planetargon.com
* Portland, OR | robby@planetargon.com
* 503.351.4730 | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
****************************************/
On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote:
On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
<joking>
Apparently gamma functions and string theory have little to do with
understanding the relational model of data.</joking>
mmmmm.. string theory. :-)
Ya you know the theory that states that the Database is really made up
of a large amount of strings. Some are even null terminated strings,
although most strings really have a quanta that can be found immediate
before the string. :-)
On Tue, 2004-10-26 at 22:03 -0500, Kevin Barnard wrote:
On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote:
On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
<joking>
Apparently gamma functions and string theory have little to do with
understanding the relational model of data.</joking>
mmmmm.. string theory. :-)
Ya you know the theory that states that the Database is really made up
of a large amount of strings. Some are even null terminated strings,
although most strings really have a quanta that can be found immediate
before the string. :-)
How do we SELECT the string so that we can observe it then? ;-)
--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON | www.planetargon.com
* Portland, OR | robby@planetargon.com
* 503.351.4730 | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
****************************************/
-----Original Message-----
From: Robby Russell [mailto:robby@planetargon.com]
Sent: Tue 10/26/2004 9:08 PM
To: Kevin Barnard
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] primary key and existing unique fields
On Tue, 2004-10-26 at 22:03 -0500, Kevin Barnard wrote:
On Tue, 26 Oct 2004 17:05:27 -0700, Robby Russell <robby@planetargon.com> wrote:
On Tue, 2004-10-26 at 17:26 -0400, Mike Mascari wrote:
<joking>
Apparently gamma functions and string theory have little to do with
understanding the relational model of data.</joking>
mmmmm.. string theory. :-)
Ya you know the theory that states that the Database is really made up
of a large amount of strings. Some are even null terminated strings,
although most strings really have a quanta that can be found immediate
before the string. :-)
How do we SELECT the string so that we can observe it then? ;-)
--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON | www.planetargon.com
* Portland, OR | robby@planetargon.com
* 503.351.4730 | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
****************************************/
You can't observe it ... only *infer* it.
Import Notes
Resolved by subject fallback
Sally Sally wrote:
This
existing unique field will have to be a character of fixed length
(VARCHAR(12)) because although it's a numeric value there will be
leading zeroes.
Plenty of people are contributing their tuppence-worth regarding the
choice of surrogate vs natural primary key.
Can I just point out that your existing unique field is EITHER a numeric
value OR it has a fixed number of characters - numbers don't have
leading zeros.
If what you have is a number, then perhaps consider int8/numeric types
and format appropriately when you display the values.
--
Richard Huxton
Archonet Ltd
On Wed, Oct 27, 2004 at 00:10:27 +0200,
Dawid Kuroczko <qnex42@gmail.com> wrote:
3. If you'll need things like "last 50 keys", you can SELECT * FROM
foo ORDER BY yourserialkey DESC LIMIT 50;
You really shouldn't be doing that if you are using sequences to generate
the key. Sequences are just guarenteed to return unique values, not to
return them in order. Because groups of sequences can be allocated to
a backend at once depending on a setting settable by a client, you can
get assignments out of order. Also for overlapping transactions what
the application means by the last 50 entries may not match what you
get when you get the 50 highest serial values.
Dawid,
I am interested in the first point you made that:
having varchar(12) in every referencing table, takes more storage
space.
The thing is though, if I have a serial primary key then it would be an
additional column. Or you are saying the space taken by a VARCHAR(12) field
is more than two INT fields? ( or is it the fact that when it is referenced
it will appear several times?) I guess the reason I am resisting the idea of
an additional primary key field is to avoid the additional lookup in some
queries. Perhaps it's a minor almost irrelevant performance factor.
Thanks
Sally
_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee�
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
Import Notes
Resolved by subject fallback
On Thu, Oct 28, 2004 at 14:31:32 +0000,
Sally Sally <dedeb17@hotmail.com> wrote:
Dawid,
I am interested in the first point you made that:
having varchar(12) in every referencing table, takes more storage
space.
The thing is though, if I have a serial primary key then it would be an
additional column. Or you are saying the space taken by a VARCHAR(12) field
is more than two INT fields? ( or is it the fact that when it is referenced
it will appear several times?) I guess the reason I am resisting the idea
of an additional primary key field is to avoid the additional lookup in
some queries. Perhaps it's a minor almost irrelevant performance factor.
I think it is better to worry about what is going to make it easiest to
have clean data and to support future changes than worry about performance.
Over the long run hardware is cheaper than people.
I think the same too but sometimes it seems in the real world performance is
given more value than a properly designed db. Or the long term flexiblity is
not taken into account given the short term requirements.
regards
Sally
From: Bruno Wolff III <bruno@wolff.to>
To: Sally Sally <dedeb17@hotmail.com>
CC: pgsql-general@postgresql.org, qnex42@gmail.com
Subject: Re: [GENERAL] primary key and existing unique fields
Date: Thu, 28 Oct 2004 12:44:00 -0500On Thu, Oct 28, 2004 at 14:31:32 +0000,
Sally Sally <dedeb17@hotmail.com> wrote:Dawid,
I am interested in the first point you made that:
having varchar(12) in every referencing table, takes more storage
space.
The thing is though, if I have a serial primary key then it would be an
additional column. Or you are saying the space taken by a VARCHAR(12)field
is more than two INT fields? ( or is it the fact that when it is
referenced
it will appear several times?) I guess the reason I am resisting the
idea
of an additional primary key field is to avoid the additional lookup in
some queries. Perhaps it's a minor almost irrelevant performance factor.I think it is better to worry about what is going to make it easiest to
have clean data and to support future changes than worry about performance.
Over the long run hardware is cheaper than people.---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
_________________________________________________________________
Check out Election 2004 for up-to-date election news, plus voter tools and
more! http://special.msn.com/msn/election2004.armx
Import Notes
Resolved by subject fallback