Avoiding surrogate keys

Started by Thom Brownalmost 16 years ago21 messagesgeneral
Jump to latest
#1Thom Brown
thombrown@gmail.com

I think I know what I plan to do, but want to throw this out there to see if
there are differing points of view.

I have a mailing list table, and 2 of the columns contain values which have
to be from a list. These are country and status. There are 237 possible
countries and 3 possible statuses. Now I know some people would assign a
sequence ID (surrogate key) to the country and status values, and have them
looked up in separate tables to get the textual value, but I think I'll
still have those tables, just without an ID column, so 1 column for both the
countries and statuses tables. This means storing the proper value in the
main table.

So instead of

name, email, country, status
'mr smith', 'emailaddress@example.com', 44, 2
'mrs jones', 'me@emailcompany.com', 21, 1

I'd have

name, email, country, status
'mr smith', 'emailaddress@example.com', 'China', 'Registered'
'mrs jones', 'me@emailcompany.com', 'Belgium', 'Unconfirmed'

The values of course would be constrained by foreign key lookup to their
associated tables.

Are there any serious downsides to this? If so, what would you recommend?

Thanks

Thom

#2Bill Moran
wmoran@potentialtech.com
In reply to: Thom Brown (#1)
Re: Avoiding surrogate keys

In response to Thom Brown <thombrown@gmail.com>:

I think I know what I plan to do, but want to throw this out there to see if
there are differing points of view.

I have a mailing list table, and 2 of the columns contain values which have
to be from a list. These are country and status. There are 237 possible
countries and 3 possible statuses. Now I know some people would assign a
sequence ID (surrogate key) to the country and status values, and have them
looked up in separate tables to get the textual value, but I think I'll
still have those tables, just without an ID column, so 1 column for both the
countries and statuses tables. This means storing the proper value in the
main table.

So instead of

name, email, country, status
'mr smith', 'emailaddress@example.com', 44, 2
'mrs jones', 'me@emailcompany.com', 21, 1

I'd have

name, email, country, status
'mr smith', 'emailaddress@example.com', 'China', 'Registered'
'mrs jones', 'me@emailcompany.com', 'Belgium', 'Unconfirmed'

The values of course would be constrained by foreign key lookup to their
associated tables.

Are there any serious downsides to this? If so, what would you recommend?

I'd use an ENUM for the status, as that's not liable to change.

The only problem I see with avoiding the surrogate key for the country
is that the table might require more disk space if a lot of the country
names end up being very long.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

#3Joshua D. Drake
jd@commandprompt.com
In reply to: Bill Moran (#2)
Re: Avoiding surrogate keys

On Wed, 2010-04-21 at 15:07 -0400, Bill Moran wrote:

I'd use an ENUM for the status, as that's not liable to change.

The only problem I see with avoiding the surrogate key for the country
is that the table might require more disk space if a lot of the country
names end up being very long.

Yeah but that isn't generally a problem now and the gain he gets from
the lack of join performance is more than worth it.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

#4Rich Shepard
rshepard@appl-ecosys.com
In reply to: Thom Brown (#1)
Re: Avoiding surrogate keys

On Wed, 21 Apr 2010, Thom Brown wrote:

I have a mailing list table, and 2 of the columns contain values which
have to be from a list.

Thom,

From 2 lists?

These are country and status.

And each is from a separate list, correct?

There are 237 possible countries and 3 possible statuses.

Now I know some people would assign a sequence ID (surrogate key) to the
country and status values,

Why? These two fields do not define a unique row, do they? If not, then
neither is a candidate key and should be treated as a regular attribute.

... and have them looked up in separate tables to get the textual value,
but I think I'll still have those tables, just without an ID column, so 1
column for both the countries and statuses tables. This means storing the
proper value in the main table.

You could have a table with two columns: abbreviation and name. Then you
could copy that table from the Internet to get the data.

So instead of

name, email, country, status
'mr smith', 'emailaddress@example.com', 44, 2
'mrs jones', 'me@emailcompany.com', 21, 1

Not only is more work, but it's confusing and unnecessary.

I'd have

name, email, country, status
'mr smith', 'emailaddress@example.com', 'China', 'Registered'
'mrs jones', 'me@emailcompany.com', 'Belgium', 'Unconfirmed'

Sure.

Are there any serious downsides to this? If so, what would you recommend?

Nope. Not even flippant downsides.

Rich

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Thom Brown (#1)
Re: Avoiding surrogate keys

On Wed, Apr 21, 2010 at 3:01 PM, Thom Brown <thombrown@gmail.com> wrote:

I think I know what I plan to do, but want to throw this out there to see if
there are differing points of view.
I have a mailing list table, and 2 of the columns contain values which have
to be from a list.  These are country and status.  There are 237 possible
countries and 3 possible statuses.  Now I know some people would assign a
sequence ID (surrogate key) to the country and status values, and have them
looked up in separate tables to get the textual value, but I think I'll
still have those tables, just without an ID column, so 1 column for both the
countries and statuses tables.  This means storing the proper value in the
main table.
So instead of
name, email, country, status
'mr smith', 'emailaddress@example.com', 44, 2
'mrs jones', 'me@emailcompany.com', 21, 1
I'd have
name, email, country, status
'mr smith', 'emailaddress@example.com', 'China', 'Registered'
'mrs jones', 'me@emailcompany.com', 'Belgium', 'Unconfirmed'
The values of course would be constrained by foreign key lookup to their
associated tables.
Are there any serious downsides to this?  If so, what would you recommend?

Natural keys:
*) force formal relationships into your key design (this is good)
*) Make your database MUCH easier to follow, browse, and understand
*) in particular cases allow you to skip joins
*) will make your indexes fatter (this is not good)
*) can be a pain if your keys are updated frequently
*) can be a major pain if your key changes in structure (adds a field,
or changes in type)

Surrogate keys:
*) Give you faster joins, but more of them (this is a win/loss
depending on circumstances)
*) Tend to encourage lazy/poor designs, since you hide relationships
behind a value
*) Make the tables more difficult to browse and understand
*) Make updates to keys/key structure trivial

I personally use natural keys when I can and surrogates when I have
to. When I do use a surrogate, I tend to still define the natural key
as primary and simply make a alternate 'unique' constraint for the
surrogate.

merlin

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Merlin Moncure (#5)
Re: Avoiding surrogate keys

On Wed, 2010-04-21 at 15:18 -0400, Merlin Moncure wrote:

Natural keys:
*) force formal relationships into your key design (this is good)
*) Make your database MUCH easier to follow, browse, and understand
*) in particular cases allow you to skip joins
*) will make your indexes fatter (this is not good)
*) can be a pain if your keys are updated frequently
*) can be a major pain if your key changes in structure (adds a field,
or changes in type)

Surrogate keys:
*) Give you faster joins, but more of them (this is a win/loss
depending on circumstances)

Kind of... natural keys can remove the need for the join in the first
place, which is certainly faster than joining :P

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

#7Thom Brown
thombrown@gmail.com
In reply to: Merlin Moncure (#5)
Re: Avoiding surrogate keys

On 21 April 2010 20:18, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Apr 21, 2010 at 3:01 PM, Thom Brown <thombrown@gmail.com> wrote:

I think I know what I plan to do, but want to throw this out there to see

if

there are differing points of view.
I have a mailing list table, and 2 of the columns contain values which

have

to be from a list. These are country and status. There are 237 possible
countries and 3 possible statuses. Now I know some people would assign a
sequence ID (surrogate key) to the country and status values, and have

them

looked up in separate tables to get the textual value, but I think I'll
still have those tables, just without an ID column, so 1 column for both

the

countries and statuses tables. This means storing the proper value in

the

main table.
So instead of
name, email, country, status
'mr smith', 'emailaddress@example.com', 44, 2
'mrs jones', 'me@emailcompany.com', 21, 1
I'd have
name, email, country, status
'mr smith', 'emailaddress@example.com', 'China', 'Registered'
'mrs jones', 'me@emailcompany.com', 'Belgium', 'Unconfirmed'
The values of course would be constrained by foreign key lookup to their
associated tables.
Are there any serious downsides to this? If so, what would you

recommend?

Natural keys:
*) force formal relationships into your key design (this is good)
*) Make your database MUCH easier to follow, browse, and understand
*) in particular cases allow you to skip joins
*) will make your indexes fatter (this is not good)
*) can be a pain if your keys are updated frequently
*) can be a major pain if your key changes in structure (adds a field,
or changes in type)

Surrogate keys:
*) Give you faster joins, but more of them (this is a win/loss
depending on circumstances)
*) Tend to encourage lazy/poor designs, since you hide relationships
behind a value
*) Make the tables more difficult to browse and understand
*) Make updates to keys/key structure trivial

I personally use natural keys when I can and surrogates when I have
to. When I do use a surrogate, I tend to still define the natural key
as primary and simply make a alternate 'unique' constraint for the
surrogate.

merlin

Thanks for the comments guys. I'm now pretty sure using natural keys is the
right approach. Neither list will ever change type, and if they change,
it'll be quite infrequent. Index size shouldn't really be a problem since
we're realistically talking thousands of rows.

What I hate about surrogate keys is having to keep looking stuff up:

INSERT INTO stuff (col_a, col_b)
SELECT 'my_value', table_b.id
FROM table_b
WHERE table_b.real_value = 'PostgreSQL';

As opposed to:

INSERT INTO stuff (col_a, col_b) VALUES ('my_value', 'PostgreSQL');

Just making sure I don't fall into the surrogate key abuse trap. :)

Thom

#8Bill Moran
wmoran@potentialtech.com
In reply to: Joshua D. Drake (#3)
Re: Avoiding surrogate keys

In response to "Joshua D. Drake" <jd@commandprompt.com>:

On Wed, 2010-04-21 at 15:07 -0400, Bill Moran wrote:

I'd use an ENUM for the status, as that's not liable to change.

The only problem I see with avoiding the surrogate key for the country
is that the table might require more disk space if a lot of the country
names end up being very long.

Yeah but that isn't generally a problem now and the gain he gets from
the lack of join performance is more than worth it.

Agreed. I guess I didn't explain properly: the disk space _may_ be an
issue if you have LOTS of REALLY LONG names.

Like, if the shortest name in your country table is 'Federated States
of Micronesia (Esquire)', then the disk space used by the table and
index might become an issue.

Also, if this is an embedded application where disk is at a premium ...

As with many things, "big" and "small" are relative, ambiguous and
copy-written by someone who thinks they can charge me every time I
use them.

One thing that a lot of people seem to get confused about is that they
subconsciously think that ints or bigints take up less space when the
numbers are small. I.e.: I want to use an int for my state identifier
instead of the 2-digit code, because it will use less space -- wrong,
an int is 4 bytes, but a 2 byte char column is 1/2 that ... even if the
number never gets higher than 50.

Personally, I think the only good reason to use a surrogate key is when
there's a high likelihood that your primary text identifier might change.
Although ON UPDATE CASCADE can even handle that, it just might take a
while.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bill Moran (#8)
Re: Avoiding surrogate keys

Bill Moran escribi�:

One thing that a lot of people seem to get confused about is that they
subconsciously think that ints or bigints take up less space when the
numbers are small. I.e.: I want to use an int for my state identifier
instead of the 2-digit code, because it will use less space -- wrong,
an int is 4 bytes, but a 2 byte char column is 1/2 that ... even if the
number never gets higher than 50.

Eh, a 2 byte char column uses 3 bytes -- there's one byte of overhead.
(Unless one of the chars is multibyte in which case it can be longer).
Earlier versions of Postgres use 6 bytes to store the 2 chars (4 bytes
of overhead), so it would be larger than the int.

Not that this invalidates the argument -- just nitpicking here.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Merlin Moncure
mmoncure@gmail.com
In reply to: Bill Moran (#8)
Re: Avoiding surrogate keys

On Wed, Apr 21, 2010 at 3:32 PM, Bill Moran <wmoran@potentialtech.com> wrote:

In response to "Joshua D. Drake" <jd@commandprompt.com>:

On Wed, 2010-04-21 at 15:07 -0400, Bill Moran wrote:

I'd use an ENUM for the status, as that's not liable to change.

The only problem I see with avoiding the surrogate key for the country
is that the table might require more disk space if a lot of the country
names end up being very long.

Yeah but that isn't generally a problem now and the gain he gets from
the lack of join performance is more than worth it.

Agreed.  I guess I didn't explain properly: the disk space _may_ be an
issue if you have LOTS of REALLY LONG names.

Like, if the shortest name in your country table is 'Federated States
of Micronesia (Esquire)', then the disk space used by the table and
index might become an issue.

Also, if this is an embedded application where disk is at a premium ...

As with many things, "big" and "small" are relative, ambiguous and
copy-written by someone who thinks they can charge me every time I
use them.

One thing that a lot of people seem to get confused about is that they
subconsciously think that ints or bigints take up less space when the
numbers are small.  I.e.: I want to use an int for my state identifier
instead of the 2-digit code, because it will use less space -- wrong,
an int is 4 bytes, but a 2 byte char column is 1/2 that ... even if the
number never gets higher than 50.

Personally, I think the only good reason to use a surrogate key is when
there's a high likelihood that your primary text identifier might change.
Although ON UPDATE CASCADE can even handle that, it just might take a
while.

Another semi-related case is when the keys are large which in turn
gives you larger indexes. Large indexes pressure your cache which in
turn gives you more disk faults which can really nail you if your
database is large. Both of these reasons fall away if/when storage
catches up with the rest of hardware IMO.

Another penalty of surrogate keys I neglected to mention is more
sorts. It's typical to want to pull data in natural order which is
obfuscated behind the surrogate key. So besides giving you a 'free
join', you get a free sort as well if you pull data off the index.

Interesting aside: I would estimate that a fairly large percentage of
-performance problems (say, 25%) can be directly traced to poor
indexing strategy and loose relationships between tables. Natural
keys tend to deflect both of those problems.

merlin

#11Martin Gainty
mgainty@hotmail.com
In reply to: Bill Moran (#2)
Re: Avoiding surrogate keys

bill-

static information such as country names *should be* de-normalised into non-indexed columns of the driving table as you have already done

if on the other hand the column information country names were changing were dynamic then you would want to keep them in their respective table

as country code 001 will always be US (and the remaining countries and country code will never change)
i would suggest keeping the full name in the driving table (same goes with state/province data btw)

it also saves the database the I/O and CPU from having to do a lookup on another table

my 2 cents
martin-
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.

Date: Wed, 21 Apr 2010 15:07:35 -0400
From: wmoran@potentialtech.com
To: thombrown@gmail.com
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Avoiding surrogate keys

In response to Thom Brown <thombrown@gmail.com>:

I think I know what I plan to do, but want to throw this out there to see if
there are differing points of view.

I have a mailing list table, and 2 of the columns contain values which have
to be from a list. These are country and status. There are 237 possible
countries and 3 possible statuses. Now I know some people would assign a
sequence ID (surrogate key) to the country and status values, and have them
looked up in separate tables to get the textual value, but I think I'll
still have those tables, just without an ID column, so 1 column for both the
countries and statuses tables. This means storing the proper value in the
main table.

So instead of

name, email, country, status
'mr smith', 'emailaddress@example.com', 44, 2
'mrs jones', 'me@emailcompany.com', 21, 1

I'd have

name, email, country, status
'mr smith', 'emailaddress@example.com', 'China', 'Registered'
'mrs jones', 'me@emailcompany.com', 'Belgium', 'Unconfirmed'

The values of course would be constrained by foreign key lookup to their
associated tables.

Are there any serious downsides to this? If so, what would you recommend?

I'd use an ENUM for the status, as that's not liable to change.

The only problem I see with avoiding the surrogate key for the country
is that the table might require more disk space if a lot of the country
names end up being very long.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

#12Mark Watson
mark.watson@jurisconcept.ca
In reply to: Martin Gainty (#11)
Re: Avoiding surrogate keys

I agree, as long as one needs the country names in only one language.
-Mark
(Sorry Martin- forgot to "Reply to all" the last time)
________________________________________
De : pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] De la part de Martin Gainty
Envoyé : 21 avril 2010 16:38 À : wmoran@potentialtech.com Cc :
pgsql-general@postgresql.org Objet : Re: [GENERAL] Avoiding surrogate keys

...

static information such as country names *should be* de-normalised into
non-indexed columns of the driving table as you have already done

if on the other hand the column information country names were changing were
dynamic then you would want to keep them in their respective table

as country code 001 will always be US (and the remaining countries and
country code will never change) i would suggest keeping the full name in the
driving table (same goes with state/province data btw)

...

#13Philippe Lang
philippe.lang@attiksystem.ch
In reply to: Thom Brown (#1)
Re: Avoiding surrogate keys

Hi,

I think nobody mentioned Object-Relational mappers. If you intend to used one (or think you may be using one in the future), using surrogate keys is more straightforward, if not necessary.

Best regards,

-------------------------------------------------------------
Attik System web : http://www.attiksystem.ch
Philippe Lang phone: +41 26 422 13 75
rte de la Fonderie 2 gsm : +41 79 351 49 94
1700 Fribourg pgp : http://keyserver.pgp.com

Show quoted text

-----Message d'origine-----
De : pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] De la part de Thom Brown
Envoyé : mercredi 21 avril 2010 21:01
À : PGSQL Mailing List
Objet : [GENERAL] Avoiding surrogate keys

I think I know what I plan to do, but want to throw this out there to
see if there are differing points of view.

I have a mailing list table, and 2 of the columns contain values which
have to be from a list. These are country and status. There are 237
possible countries and 3 possible statuses. Now I know some people
would assign a sequence ID (surrogate key) to the country and status
values, and have them looked up in separate tables to get the textual
value, but I think I'll still have those tables, just without an ID
column, so 1 column for both the countries and statuses tables. This
means storing the proper value in the main table.

So instead of

name, email, country, status
'mr smith', 'emailaddress@example.com', 44, 2
'mrs jones', 'me@emailcompany.com', 21, 1

I'd have

name, email, country, status
'mr smith', 'emailaddress@example.com', 'China', 'Registered'
'mrs jones', 'me@emailcompany.com', 'Belgium', 'Unconfirmed'

The values of course would be constrained by foreign key lookup to
their associated tables.

Are there any serious downsides to this? If so, what would you
recommend?

Thanks

Thom

#14Lew
noone@lwsc.ehost-services.com
In reply to: Philippe Lang (#13)
Re: Avoiding surrogate keys

Philippe Lang wrote:

I think nobody mentioned Object-Relational mappers.
If you intend to used one (or think you may be using one in the future),
using surrogate keys is more straightforward, if not necessary.

Neither of those claims is even slightly true. Using Hibernate, EclipseLink
or OpenJPA (for Java applications), natural keys are sufficient and far more
straightforward than surrogate keys.

--
Lew

#15Merlin Moncure
mmoncure@gmail.com
In reply to: Lew (#14)
Re: Avoiding surrogate keys

On Sat, May 1, 2010 at 12:09 PM, Lew <noone@lwsc.ehost-services.com> wrote:

Philippe Lang wrote:

I think nobody mentioned Object-Relational mappers.
If you intend to used one (or think you may be using one in the future),
using surrogate keys is more straightforward, if not necessary.

Neither of those claims is even slightly true.  Using Hibernate, EclipseLink
or OpenJPA (for Java applications), natural keys are sufficient and far more
straightforward than surrogate keys.

right -- to be fair though is quite a bit of (generally bad) software
out there that assumes or at least heavily encourages surrogate keys.

merlin

#16John R Pierce
pierce@hogranch.com
In reply to: Thom Brown (#1)
Re: Avoiding surrogate keys

If your 'natural key' is a large text field, I'd have to assume there's
some point at which a surrogate index would be more efficient. Would
this be above a few dozen characters, or a few 100 characters? I
wouldn't want a PK based on a multi-K byte text field for a table that
has many 10s or 100s of 1000s of rows, for sure.

#17Merlin Moncure
mmoncure@gmail.com
In reply to: John R Pierce (#16)
Re: Avoiding surrogate keys

On Sat, May 1, 2010 at 4:14 PM, John R Pierce <pierce@hogranch.com> wrote:

If your 'natural key' is a large text field, I'd have to assume there's some
point at which a surrogate index would be more efficient.  Would this be
above a few dozen characters, or a few 100 characters?   I wouldn't want a
PK based on a multi-K byte text field for a table that has many 10s or 100s
of 1000s of rows, for sure.

Well, yes, but:
*) most natural keys are small, between 4-10 bytes
*) regardless of the data type of the key, the number of btree lookups
is going to remain approximately the same
*) you have to (or at least should) put the index on anyway as unique
constraint. you do pay the price in dependent tables however. so the
natural _primary_ key is free. it's the foreign keys where you pay.

the two main performance issues with natural keys are this (you kinda
touched on one):
*) the index is fatter, pressuring cache
It's not so much the comparison function but the fact that the larger
index(es) require more memory. If drives were faster than they were
this wouldn't matter as much -- I expect this to become less of a
factor as SSD technology improves. This can somewhat modulated by
clustering the index...you get a better chance of reading multiple
relevant records on a single page.
*) cascading updates
If your key is in a lot of places and has to be updated it can cause a
mess. Lots of locks, dead space, vacuuming, etc. Most of the time
primary keys don't change very much but if they do you had better give
it fair consideration.

Natural keys have a lot of performance advantages as mentioned upthread.

merlin

#18Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#17)
Re: Avoiding surrogate keys

On Tue, May 4, 2010 at 9:40 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Sat, May 1, 2010 at 4:14 PM, John R Pierce <pierce@hogranch.com> wrote:

If your 'natural key' is a large text field, I'd have to assume there's some
point at which a surrogate index would be more efficient.  Would this be
above a few dozen characters, or a few 100 characters?   I wouldn't want a
PK based on a multi-K byte text field for a table that has many 10s or 100s
of 1000s of rows, for sure.

one more note about this. if you truly have a situation where a multi
kilobyte chunk of data is the key, you can always digest it and use
that. you lose the natural ordering -- but in these type of cases it
usually doesn't matter.

merlin

#19Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Merlin Moncure (#18)
Re: Avoiding surrogate keys

On Tue, May 4, 2010 at 3:16 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, May 4, 2010 at 9:40 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Sat, May 1, 2010 at 4:14 PM, John R Pierce <pierce@hogranch.com> wrote:

If your 'natural key' is a large text field, I'd have to assume there's some
point at which a surrogate index would be more efficient.  Would this be
above a few dozen characters, or a few 100 characters?   I wouldn't want a
PK based on a multi-K byte text field for a table that has many 10s or 100s
of 1000s of rows, for sure.

one more note about this.  if you truly have a situation where a multi
kilobyte chunk of data is the key, you can always digest it and use
that.  you lose the natural ordering -- but in these type of cases it
usually doesn't matter.

99% of all cases, where I introduced bigint surrogate key, in tables
that hold account, customers, and that sort of bollocks, were straight
wins, sometimes measuring 3-4x.

Another thing, If your PK changes, it is no longer a PK, you can't rely on it.

--
GJ

#20Richard Broersma
richard.broersma@gmail.com
In reply to: Grzegorz Jaśkiewicz (#19)
Re: Avoiding surrogate keys

2010/5/4 Grzegorz Jaśkiewicz <gryzman@gmail.com>:

Another thing, If your PK changes, it is no longer a PK, you can't rely on it.

Depending upon what you mean by changes this could be true or it could
be a matter of opinion. If your referring to a candidate key's value
changes, this key still provides a useful way to identify a tuple.

Also, just because a surrogate key is a useful way to identify a row,
by itself it does nothing to uniquely identify an entity that a row
represents. On some DBMS forums that I frequent, it is a daily
occurrence to see questions on how to eliminate duplicate rows. So,
for some people, there are times when even surrogate keys cannot be
relied upon as a PK.
--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

#21Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Richard Broersma (#20)