How to modify ENUM datatypes?
All,
In the past I have used foreign keys to lookup tables for small lists of
values that I now think ENUM could do the job of. I was hoping that by
using ENUM, I would avoid having to do joins in my queries, and that I'd
be making the data more robust and faster.
I used to have a table for account_status:
A | Active
B | Billing Failed
C | Closed
D | Deactivated
account.acct_type CHAR references account_type.acct_type CHAR
But, now I've converted that to an ENUM:
ACTIVE
BILLING_FAILED
CLOSED
DEACTIVATED
account.acct_type ENUM account_type
The problem is that once I create a column in my account table that uses
this 'account_type' datatype, I can't seem to change or add to it any
more. I want to add a new value or edit/delete an existing one.
How do you make changes to an ENUM datatype that is already in use?
-- Dante
D. Dante Lorenso wrote:
The problem is that once I create a column in my account table that
uses this 'account_type' datatype, I can't seem to change or add to it
any more. I want to add a new value or edit/delete an existing one.How do you make changes to an ENUM datatype that is already in use?
As far as I know ENUM is not well suited to uses where new enumeration
members may be added later. A lookup table and a foreign key is probably
better for this sort of use.
You *can* add elements to an enum type - sort of - by creating a new
type and converting columns. It's ugly, though, and will be hard to get
right when the column of interest is referenced by foreign keys and such.
One way to do it if you really must:
-- Starting state
CREATE TYPE et1 AS ENUM('yes','no');
CREATE TABLE testtab (
a et
);
INSERT INTO testtab (a) values ('yes');
-- Change
CREATE TYPE et2 AS ENUM('yes','no','filenotfound');
ALTER TABLE testtab ALTER COLUMN a TYPE et2 USING (
CASE a WHEN 'yes' THEN 'yes'::et2 WHEN 'no' THEN 'no'::et2 END
);
-- Alternative ALTER that's suitable if you're not removing anything
from the enum
ALTER TABLE testtab ALTER COLUMN a TYPE et2 USING ( a::text::et2 );
Personally, though, I'd stick to the good 'ol lookup table and foreign key.
--
Craig Ringer
On Tue, 22 Apr 2008 20:58:30 +0800 Craig Ringer wrote:
D. Dante Lorenso wrote:
The problem is that once I create a column in my account table that
uses this 'account_type' datatype, I can't seem to change or add to it
any more. I want to add a new value or edit/delete an existing one.How do you make changes to an ENUM datatype that is already in use?
As far as I know ENUM is not well suited to uses where new enumeration
members may be added later. A lookup table and a foreign key is probably
better for this sort of use.
I remember the discussions before PG implemented ENUMs at all - some
people voted against this "feature" because they knew that questions
about modifing the enum values would pop up sooner or later.
You *can* add elements to an enum type - sort of - by creating a new
type and converting columns. It's ugly, though, and will be hard to get
right when the column of interest is referenced by foreign keys and such.
If there's one request to modify a specific ENUM column, other requests
will follow because the enum set is not as static as it seems in the
first place. So i beg that the only advise should be "change the column
to a foreign key construct". Converting columns to new data types is
much more overhead anyway.
Kind regards
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
Andreas 'ads' Scherbaum wrote:
On Tue, 22 Apr 2008 20:58:30 +0800 Craig Ringer wrote:
D. Dante Lorenso wrote:
The problem is that once I create a column in my account table that
uses this 'account_type' datatype, I can't seem to change or add to it
any more. I want to add a new value or edit/delete an existing one.
How do you make changes to an ENUM datatype that is already in use?As far as I know ENUM is not well suited to uses where new enumeration
members may be added later. A lookup table and a foreign key is probably
better for this sort of use.I remember the discussions before PG implemented ENUMs at all - some
people voted against this "feature" because they knew that questions
about modifing the enum values would pop up sooner or later.You *can* add elements to an enum type - sort of - by creating a new
type and converting columns. It's ugly, though, and will be hard to get
right when the column of interest is referenced by foreign keys and such.If there's one request to modify a specific ENUM column, other requests
will follow because the enum set is not as static as it seems in the
first place. So i beg that the only advise should be "change the column
to a foreign key construct". Converting columns to new data types is
much more overhead anyway.
So, the advice here is "don't use ENUM"?
I was really hoping that it would be more efficient to not have to do
all the foreign keys and joins for tables that may have 4-5 enum types.
Just being able to:
SELECT *
FROM tablename
would be nice if my columns contained enums instead of doing:
SELECT *
FROM tablename, lookuptable
WHERE tablename.some_id = lookuptable.some_id
Isn't the join more expensive?
-- Dante
On Apr 22, 2008, at 1:45 PM, D. Dante Lorenso wrote:
So, the advice here is "don't use ENUM"?
I think it's more "Don't use ENUM for a type that you are planning to
extend."
On Tue, Apr 22, 2008 at 3:31 PM, D. Dante Lorenso <dante@lorenso.com> wrote:
All,
In the past I have used foreign keys to lookup tables for small lists of
values that I now think ENUM could do the job of. I was hoping that by
using ENUM, I would avoid having to do joins in my queries, and that I'd be
making the data more robust and faster.I used to have a table for account_status:
A | Active
B | Billing Failed
C | Closed
D | Deactivatedaccount.acct_type CHAR references account_type.acct_type CHAR
But, now I've converted that to an ENUM:
ACTIVE
BILLING_FAILED
CLOSED
DEACTIVATEDaccount.acct_type ENUM account_type
The problem is that once I create a column in my account table that uses
this 'account_type' datatype, I can't seem to change or add to it any more.
I want to add a new value or edit/delete an existing one.How do you make changes to an ENUM datatype that is already in use?
I agree with others that ENUMs stop looking pretty when you need to modify
them...
Here's a thread from recent past where this exact problem was discussed...
maybe it'll interest you...
http://archives.postgresql.org/pgsql-general/2007-12/msg01363.php
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
On Tue, Apr 22, 2008 at 2:45 PM, D. Dante Lorenso <dante@lorenso.com> wrote:
So, the advice here is "don't use ENUM"?
I was really hoping that it would be more efficient to not have to do all
the foreign keys and joins for tables that may have 4-5 enum types.Just being able to:
SELECT *
FROM tablename
If you use a "lookup table" methodology you still get that. Try this:
smarlowe=# create table choices (color text primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"choices_pkey" for table "choices"
CREATE TABLE
smarlowe=# insert into choices values ('red'),('yellow'),('green'),('blue');
INSERT 0 4
smarlowe=# create table mystuff (id serial primary key, usenam text,
mycolor text references choices(color));
NOTICE: CREATE TABLE will create implicit sequence "mystuff_id_seq"
for serial column "mystuff.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"mystuff_pkey" for table "mystuff"
CREATE TABLE
smarlowe=# insert into mystuff (usenam, mycolor) values ('scott','red');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('darren','blue');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('dan','green');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('steve','green');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('mike','black');
ERROR: insert or update on table "mystuff" violates foreign key
constraint "mystuff_mycolor_fkey"
DETAIL: Key (mycolor)=(black) is not present in table "choices".
smarlowe=# select * from mystuff;
id | usenam | mycolor
----+--------+---------
1 | scott | red
2 | darren | blue
3 | dan | green
4 | steve | green
(4 rows)
tada! No enum, and no join. But you can't insert illegal values in mycolor...
On Tue, 22 Apr 2008 15:45:39 -0500
"D. Dante Lorenso" <dante@lorenso.com> wrote:
I was really hoping that it would be more efficient to not have to do
all the foreign keys and joins for tables that may have 4-5 enum
types.Just being able to:
SELECT *
FROM tablenamewould be nice if my columns contained enums instead of doing:
SELECT *
FROM tablename, lookuptable
WHERE tablename.some_id = lookuptable.some_idIsn't the join more expensive?
You were using natural keys, the join would not be required.
Joshua D. Drake
-- Dante
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Scott Marlowe wrote:
On Tue, Apr 22, 2008 at 2:45 PM, D. Dante Lorenso <dante@lorenso.com> wrote:
So, the advice here is "don't use ENUM"?
I was really hoping that it would be more efficient to not have to do all
the foreign keys and joins for tables that may have 4-5 enum types.
Just being able to:
SELECT *
FROM tablenameIf you use a "lookup table" methodology you still get that. Try this:
smarlowe=# create table choices (color text primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"choices_pkey" for table "choices"
CREATE TABLE
smarlowe=# insert into choices values ('red'),('yellow'),('green'),('blue');
INSERT 0 4
smarlowe=# create table mystuff (id serial primary key, usenam text,
mycolor text references choices(color));
NOTICE: CREATE TABLE will create implicit sequence "mystuff_id_seq"
for serial column "mystuff.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"mystuff_pkey" for table "mystuff"
CREATE TABLE
smarlowe=# insert into mystuff (usenam, mycolor) values ('scott','red');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('darren','blue');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('dan','green');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('steve','green');
INSERT 0 1
smarlowe=# insert into mystuff (usenam, mycolor) values ('mike','black');
ERROR: insert or update on table "mystuff" violates foreign key
constraint "mystuff_mycolor_fkey"
DETAIL: Key (mycolor)=(black) is not present in table "choices".
smarlowe=# select * from mystuff;
id | usenam | mycolor
----+--------+---------
1 | scott | red
2 | darren | blue
3 | dan | green
4 | steve | green
(4 rows)
tada! No enum, and no join. But you can't insert illegal values in mycolor...
This approach is so old-school, I seem to have overlooked the obvious.
Here you've elected to use the foreign key to just control the possible
values inserted but not really to look up the value.
Seems you are storing the values in text form which goes against all the
normalization techniques I've learned in school. I see this might be a
problem with storage since you will need to store the TEXT value for
every row in the 'mystuff' table instead of just storing the reference
to the lookup table as an INTEGER. Over millions of rows, perhaps this
would become a concern?
What is the general consensus by the community about this approach? Is
this de-normalization frowned upon, or is there a performance advantage
here that warrants the usage?
-- Dante
Show quoted text
On Tue, 2008-04-22 at 16:34 -0500, D. Dante Lorenso wrote:
I see this might be a
problem with storage since you will need to store the TEXT value for
every row in the 'mystuff' table instead of just storing the reference
to the lookup table as an INTEGER. Over millions of rows, perhaps this
would become a concern?
It does use additional storage to store the full text value, rather than
a fixed-size integer. However, the difference is not much when the
average string length is short.
If you store an integer reference instead, joins are not necessarily
expensive. If the number of distinct values is small (which is the
normal use case for ENUM), I would expect the joins to be quite cheap.
Beware of running into bad plans however, or making the optimizer work
too hard (if you have a lot of other joins, too).
I don't think the performance concerns are major, but worth considering
if you have millions of rows.
What is the general consensus by the community about this approach? Is
this de-normalization frowned upon, or is there a performance advantage
here that warrants the usage?
This is not de-normalization, at all. Normalization is a formal process,
and if this were de-normalization, you could find a specific rule that
is violated by this approach.
Look here:
http://en.wikipedia.org/wiki/Third_normal_form
If you go to higher normal forms, you will not find any violations
there, either. There is nothing about normalization that requires the
use of surrogate keys.
The approach suggested by Scott Marlowe is normalized as well as being
quite natural and simple. I think often this is overlooked as being "too
simple", but it's a quite good design in many cases.
Regards,
Jeff Davis
On Tue, Apr 22, 2008 at 6:11 PM, Jeff Davis <pgsql@j-davis.com> wrote:
If you store an integer reference instead, joins are not necessarily
expensive. If the number of distinct values is small (which is the
normal use case for ENUM), I would expect the joins to be quite cheap.
Beware of running into bad plans however, or making the optimizer work
too hard (if you have a lot of other joins, too).
Necessarily being the operative word here. Think about an enum as
part of a composite key for example. It's a lot nicer to rely on enum
for natural ordering than doing something like a functional index.
Anyways, it's pretty easy to extend an enum...you can manually insert
an entry into pg_enum (see the relevent docs). Just watch out for oid
overlap. One thing currently that is very difficult currently to do
is to alter the order of the enum elements. The current state of
things is pretty workable though.
Scott's color/mystuff example is generally preferred for a lot of
cases. I _really_ prefer this to surrogate style enums where you have
color_id...this approach makes your database unreadable IMO. A decent
hybrid approach which I have been using lately is "char" (not char)
where the choices set is reasonably small, well represented by a
single character, and the intrinsic ordering property is not too
important (where an enum might be better). In many cases though, the
pure natural approach is simply the best. The enum though with is
intrinsic ordering and more efficient indexing has an important niche
however.
merlin
On Tue, 22 Apr 2008 15:45:39 -0500 D. Dante Lorenso wrote:
Andreas 'ads' Scherbaum wrote:
So, the advice here is "don't use ENUM"?
Yes. You should/can use ENUM for something like 'gender':
male, female, unknown. You don't need to add other values ever (yeah, i
skipped some special cases).
But if you use ENUM for color names (as example), it's easy to imagine
that someone comes around and requests a new color to be added to the
list. Here you should use a lookup table and a foreign key instead.
It always depends on the situation. The real problem is that people
start creating the table with ENUM and "oh, nice, this makes it easy"
in mind. But usually things are not as easy as it looks on first sight,
so they later run into trouble.
Kind regards
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
On Wed, Apr 23, 2008 at 12:38:48PM +0200, Andreas 'ads' Scherbaum wrote:
Yes. You should/can use ENUM for something like 'gender':
male, female, unknown. You don't need to add other values ever (yeah, i
skipped some special cases).
I was gonna say ! :-)
Add
hermaphrodite
transgender with female phenotype
transgender with male phenotype
and you should be set from current medical science's point
of view ;-)
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Wednesday 23 April 2008 14:10, Karsten Hilbert wrote:
On Wed, Apr 23, 2008 at 12:38:48PM +0200, Andreas 'ads' Scherbaum wrote:
Yes. You should/can use ENUM for something like 'gender':
male, female, unknown. You don't need to add other values ever (yeah, i
skipped some special cases).I was gonna say ! :-)
Add
hermaphrodite
transgender with female phenotype
transgender with male phenotypeand you should be set from current medical science's point
of view ;-)
The standard is unknown, male, female, and n/a.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Wed, Apr 23, 2008 at 2:51 PM, Robert Treat
<xzilla@users.sourceforge.net> wrote:
On Wednesday 23 April 2008 14:10, Karsten Hilbert wrote:
On Wed, Apr 23, 2008 at 12:38:48PM +0200, Andreas 'ads' Scherbaum wrote:
Yes. You should/can use ENUM for something like 'gender':
male, female, unknown. You don't need to add other values ever (yeah, i
skipped some special cases).I was gonna say ! :-)
Add
hermaphrodite
transgender with female phenotype
transgender with male phenotypeand you should be set from current medical science's point
of view ;-)The standard is unknown, male, female, and n/a.
Both unknown and n/a sounds like NULL to me.
--
Rob Wultsch
wultsch@gmail.com
wultsch (aim)
On Wed, Apr 23, 2008 at 12:10 PM, Karsten Hilbert
<Karsten.Hilbert@gmx.net> wrote:
On Wed, Apr 23, 2008 at 12:38:48PM +0200, Andreas 'ads' Scherbaum wrote:
Yes. You should/can use ENUM for something like 'gender':
male, female, unknown. You don't need to add other values ever (yeah, i
skipped some special cases).I was gonna say ! :-)
Add
hermaphrodite
transgender with female phenotype
transgender with male phenotypeand you should be set from current medical science's point
of view ;-)
Actually, hermaphrodite specifies complete male and female genitalia,
which is impossible in humans. While various forms of address are
available for people born with parts of both male and female
genitalia, the most common and easiest is intersex.
The folks here: http://www.isna.org/faq/ have a lot more to say about
it, and seeing as how many of them ARE intersex, I'd leave it up to
them.
While most transgender folks prefer to be referred to as their assumed
gender, there are some gender queer folks who prefer other forms of
address.
I would put it that gender is not so easily defined, which makes it a
poor choice for enum.
Scott Marlowe wrote:
On Wed, Apr 23, 2008 at 12:10 PM, Karsten Hilbert
<Karsten.Hilbert@gmx.net> wrote:On Wed, Apr 23, 2008 at 12:38:48PM +0200, Andreas 'ads' Scherbaum wrote:
Yes. You should/can use ENUM for something like 'gender':
male, female, unknown. You don't need to add other values ever (yeah, i
skipped some special cases).I was gonna say ! :-)
Add
hermaphrodite
transgender with female phenotype
transgender with male phenotypeand you should be set from current medical science's point
of view ;-)Actually, hermaphrodite specifies complete male and female genitalia,
which is impossible in humans. While various forms of address are
available for people born with parts of both male and female
genitalia, the most common and easiest is intersex.The folks here: http://www.isna.org/faq/ have a lot more to say about
it, and seeing as how many of them ARE intersex, I'd leave it up to
them.
Scott, there's absolutely nothing in that FAQ about their database, let
alone whether or not they use ENUMs.
(heh)
While most transgender folks prefer to be referred to as their assumed
gender, there are some gender queer folks who prefer other forms of
address.I would put it that gender is not so easily defined, which makes it a
poor choice for enum.
Absolutely true. Which is odd, because this example is trotted out
whenever there's a thread about ENUMs.
On Wed, 23 Apr 2008 17:18:12 -0600 Scott Marlowe wrote:
I would put it that gender is not so easily defined, which makes it a
poor choice for enum.
That's why my original statement had the additional note about special
cases.
If you write an address book you normally don't want to add information
like transgender - just 'male', 'female', 'unknown' seems enough here.
If you define a medical database, you have to extend the ENUM values a
bit, but even then you know the possible values in advance before
creating the ENUM - just that you have some more choices than in your
address book.
It all depends on the use case, just like ENUM itself.
Kind regards
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
On Wed, Apr 23, 2008 at 07:46:07PM -0400, brian wrote:
Absolutely true. Which is odd, because this example is trotted out
whenever there's a thread about ENUMs.
I don't think it's odd at all. In my view, the people who think enums are a
good datatype for databases are exactly the sorts who'd think that their
data is as static as this poor understanding of the vagaries of individuals'
sex (gender is a different problem, given its association with social roles)
would suggest.
The world moves around in unexpected ways. Your data model needs to
accommodate itself to the world, because the alternative is not going to
happen.
A
On Thu, Apr 24, 2008 at 9:39 AM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Wed, Apr 23, 2008 at 07:46:07PM -0400, brian wrote:
Absolutely true. Which is odd, because this example is trotted out
whenever there's a thread about ENUMs.I don't think it's odd at all. In my view, the people who think enums are a
good datatype for databases are exactly the sorts who'd think that their
data is as static as this poor understanding of the vagaries of individuals'
sex (gender is a different problem, given its association with social roles)
would suggest.The world moves around in unexpected ways. Your data model needs to
accommodate itself to the world, because the alternative is not going to
happen.
Mostly true, but there are systems where certain parts really do tend
to be static over very long periods, and for those, I might admit to
ENUM being an answer. for instance, in the USAF, all units being
worked on in our avionics shop were in one of three categories,
Awaiting Maintenance, Awaiting Parts, or In Work. They were
abbreviated AWM, AWP, and INW.
That was back in the 1980s. I'm willing to bet it hasn't changed
since then. For those types of problems, enums make a certain amount
of sense, especially if you're tracking thousands of line units being
worked on every hour of every day across the US. the small saving in
space adds up fast. But that's an artificially constructed set of
choices.