Normalize or not?

Started by Perry Smithalmost 15 years ago6 messagesgeneral

pedzsan@gmail.com

almost 15 years ago

Based upon my reading of wikipedia, the solution I think I want to implement is not in 2nd normal form. I'm wondering if anyone on this list has suggestions, etc.

I have a table called containers where object A will contain object B. There is a middle column that will describe the type of association. For example, the account team will have Joe as a member and Fred will be the lead. This will starts off with:

Containers:
Left How Right
Account member Joe
Account lead Fred

There is another table (which I've talked about before) which tells me that "Account" is a "Team". I call this the names table and has two columns: name and type. So:

Names:
Name Type
Account Team
Joe User
Fred User

in this case. I want to restrict the type of association e.g. I want teams to contain users but not allow users to contain teams. And I want some type of associations not apply to others like a Group can simply "contain" a Team but a Team can not contain a Group. Thus, I have a third table with entries like:

Types of Associations:
LType How RType
Team member User
Team lead User
Group contain Team

There is a foreign key constraint so that Containers(Left) and Containers(Right) must be in Names(Name) (I hope that syntax makes sense). But now comes the hard part.

For each row in Containers, I want to take the Type from Names associated with the Left + Containers(How) + the Type from Names associated with the Right to be an entry in the Types of Associations table. For eacmple, Account member Joe would translate to Team member User which is in the Types of Associations table so it is a valid entry. But I don't believe I can do this with foreign key constraints. If I can, then stop me here and tell me how.

My solution to solving this that I am considering is to add in the LType and RType into the Containers so now Containers becomes:

Containers:
Left LType How Right RType
Account Team member Joe User
Account Team lead Fred User

and now my foreign key constraint is simply that Containers(Ltype),Containers(How),Containers(RType) be in Types of Association. I would also change my constraint so that Left,LType of Containers must be in Names as well as Right,RType be in Names.

According to my interpretation, the RType and LType columns of Containers are non-prime but are functionally dependent upon either Left or Right so this table is not even second normal form. But, if I add in the constraint that both the Name and Type must be in Names, does it really matter? I know that I will not be able to change just LType or just RType and create an update anomaly. I guess I should mention that change the Type of a name is not even being considered. The "Names" and "Types of Associations" tables I think of as a type of constants which will be defined and rarely change.

I hope this is reasonably easy to follow. I'm looking forward to your thoughts and comments.

Thank you,
Perry Smith

pierce@hogranch.com

almost 15 years ago

In reply to: Perry Smith (#1)

Re: Normalize or not?

On 04/14/11 4:28 PM, Perry Smith wrote:

I hope this is reasonably easy to follow. I'm looking forward to your thoughts and comments.

at least on first glance, that looks like object oriented methodology,
not relational.

David G. Johnston

david.g.johnston@gmail.com

almost 15 years ago

In reply to: Perry Smith (#1)

Re: Normalize or not?

It is not easy to follow...but I'll toss out some thoughts anyway.

I would generally not de-normalize the data model in order to make
referential integrity easier. If your "requirements" are that complex then
writing a wrapper around the insert/update layer for the tables in question
is a better idea. You can perform a higher level of validation without
sacrificing data integrity.

I would suggest looking at both the kinds of selection queries you plan to
run as well as what kinds of data (i.e., relationships between containers
and objects) to see how readily the final model will work. Two things that
SQL does have challenges with is indefinite hierarchies and many-to-many
relationships with constraints. You seem to be using both. It isn't that
SQL cannot handle them but as a programmer you need to be extra careful to
introduce appropriate constraints on data entry and to spend extra time
figuring out what valid combinations are likely to result in incorrect
results given the kinds of queries you are using.

There isn't anything wrong with the normal form violation but you probably
want to understand where your model is vulnerable to update anomalies. If
you can add some additional tables and turn the de-normalized data into
foreign-keys you can use referential integrity to at least limit your
exposure but it may not remove it completely.

In the end sometimes the only real question of import is whether the
solution works. If you are concerned then anything built upon the schema
should be designed such that you can refactor the underlying schema if you
desire. But it isn't necessarily worth it to exhaust every possible avenue
to find the better solution if you have one that works. The question
becomes whether you have the resources (time and/or money) to search for the
better solution now or whether you should just use the immediately viable
solution until it breaks.

David J.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Perry Smith
Sent: Thursday, April 14, 2011 7:29 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Normalize or not?

I hope this is reasonably easy to follow. I'm looking forward to your
thoughts and comments.

Thank you,
Perry Smith

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pedzsan@gmail.com

almost 15 years ago

In reply to: David G. Johnston (#3)

Re: Normalize or not?

Mostly, this entire project is for me to learn. What started out as a toy has gown into something that about 40 people use each day but it is definitely not normal production stress. Or course, I'd like to assume and do things as if it was going to be stressed.

For whatever reason, I've done simple functions but I've never done complex wrappers like you are talking about. Just to be sure, when you say "writ[e] a wrapper around the insert/update layer", you are referring to the techniques documented in chapter 36 "The Rule System" ?

That has always intimidated me but perhaps its time to grow.

Thank you again,
Perry

On Apr 14, 2011, at 7:00 PM, David Johnston wrote:

Show quoted text

It is not easy to follow...but I'll toss out some thoughts anyway.

I would generally not de-normalize the data model in order to make
referential integrity easier. If your "requirements" are that complex then
writing a wrapper around the insert/update layer for the tables in question
is a better idea. You can perform a higher level of validation without
sacrificing data integrity.

I would suggest looking at both the kinds of selection queries you plan to
run as well as what kinds of data (i.e., relationships between containers
and objects) to see how readily the final model will work. Two things that
SQL does have challenges with is indefinite hierarchies and many-to-many
relationships with constraints. You seem to be using both. It isn't that
SQL cannot handle them but as a programmer you need to be extra careful to
introduce appropriate constraints on data entry and to spend extra time
figuring out what valid combinations are likely to result in incorrect
results given the kinds of queries you are using.

There isn't anything wrong with the normal form violation but you probably
want to understand where your model is vulnerable to update anomalies. If
you can add some additional tables and turn the de-normalized data into
foreign-keys you can use referential integrity to at least limit your
exposure but it may not remove it completely.

In the end sometimes the only real question of import is whether the
solution works. If you are concerned then anything built upon the schema
should be designed such that you can refactor the underlying schema if you
desire. But it isn't necessarily worth it to exhaust every possible avenue
to find the better solution if you have one that works. The question
becomes whether you have the resources (time and/or money) to search for the
better solution now or whether you should just use the immediately viable
solution until it breaks.

David J.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Perry Smith
Sent: Thursday, April 14, 2011 7:29 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Normalize or not?

I hope this is reasonably easy to follow. I'm looking forward to your
thoughts and comments.

Thank you,
Perry Smith

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

almost 15 years ago

In reply to: Perry Smith (#4)

Re: Normalize or not?

More along the lines of pl/pgsql and/or whatever application language you are using; not rules.

On Apr 15, 2011, at 8:45, Perry Smith <pedzsan@gmail.com> wrote:

Show quoted text

Mostly, this entire project is for me to learn. What started out as a toy has gown into something that about 40 people use each day but it is definitely not normal production stress. Or course, I'd like to assume and do things as if it was going to be stressed.

For whatever reason, I've done simple functions but I've never done complex wrappers like you are talking about. Just to be sure, when you say "writ[e] a wrapper around the insert/update layer", you are referring to the techniques documented in chapter 36 "The Rule System" ?

That has always intimidated me but perhaps its time to grow.

Thank you again,
Perry

On Apr 14, 2011, at 7:00 PM, David Johnston wrote:

It is not easy to follow...but I'll toss out some thoughts anyway.

I would generally not de-normalize the data model in order to make
referential integrity easier. If your "requirements" are that complex then
writing a wrapper around the insert/update layer for the tables in question
is a better idea. You can perform a higher level of validation without
sacrificing data integrity.

I would suggest looking at both the kinds of selection queries you plan to
run as well as what kinds of data (i.e., relationships between containers
and objects) to see how readily the final model will work. Two things that
SQL does have challenges with is indefinite hierarchies and many-to-many
relationships with constraints. You seem to be using both. It isn't that
SQL cannot handle them but as a programmer you need to be extra careful to
introduce appropriate constraints on data entry and to spend extra time
figuring out what valid combinations are likely to result in incorrect
results given the kinds of queries you are using.

There isn't anything wrong with the normal form violation but you probably
want to understand where your model is vulnerable to update anomalies. If
you can add some additional tables and turn the de-normalized data into
foreign-keys you can use referential integrity to at least limit your
exposure but it may not remove it completely.

In the end sometimes the only real question of import is whether the
solution works. If you are concerned then anything built upon the schema
should be designed such that you can refactor the underlying schema if you
desire. But it isn't necessarily worth it to exhaust every possible avenue
to find the better solution if you have one that works. The question
becomes whether you have the resources (time and/or money) to search for the
better solution now or whether you should just use the immediately viable
solution until it breaks.

David J.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Perry Smith
Sent: Thursday, April 14, 2011 7:29 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Normalize or not?

I hope this is reasonably easy to follow. I'm looking forward to your
thoughts and comments.

Thank you,
Perry Smith

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pedzsan@gmail.com

almost 15 years ago

In reply to: David G. Johnston (#5)

Re: Normalize or not?

Ah. I'm interfacing to the database via Ruby. So, write a function in pl/pgsql to do an insert, update, and delete that maintain the integrity that I want and then call those via Ruby.

On Apr 15, 2011, at 7:59 AM, David Johnston wrote:

Show quoted text

More along the lines of pl/pgsql and/or whatever application language you are using; not rules.

On Apr 15, 2011, at 8:45, Perry Smith <pedzsan@gmail.com> wrote:

Mostly, this entire project is for me to learn. What started out as a toy has gown into something that about 40 people use each day but it is definitely not normal production stress. Or course, I'd like to assume and do things as if it was going to be stressed.

For whatever reason, I've done simple functions but I've never done complex wrappers like you are talking about. Just to be sure, when you say "writ[e] a wrapper around the insert/update layer", you are referring to the techniques documented in chapter 36 "The Rule System" ?

That has always intimidated me but perhaps its time to grow.

Thank you again,
Perry

On Apr 14, 2011, at 7:00 PM, David Johnston wrote:

It is not easy to follow...but I'll toss out some thoughts anyway.

I would generally not de-normalize the data model in order to make
referential integrity easier. If your "requirements" are that complex then
writing a wrapper around the insert/update layer for the tables in question
is a better idea. You can perform a higher level of validation without
sacrificing data integrity.

I would suggest looking at both the kinds of selection queries you plan to
run as well as what kinds of data (i.e., relationships between containers
and objects) to see how readily the final model will work. Two things that
SQL does have challenges with is indefinite hierarchies and many-to-many
relationships with constraints. You seem to be using both. It isn't that
SQL cannot handle them but as a programmer you need to be extra careful to
introduce appropriate constraints on data entry and to spend extra time
figuring out what valid combinations are likely to result in incorrect
results given the kinds of queries you are using.

There isn't anything wrong with the normal form violation but you probably
want to understand where your model is vulnerable to update anomalies. If
you can add some additional tables and turn the de-normalized data into
foreign-keys you can use referential integrity to at least limit your
exposure but it may not remove it completely.

In the end sometimes the only real question of import is whether the
solution works. If you are concerned then anything built upon the schema
should be designed such that you can refactor the underlying schema if you
desire. But it isn't necessarily worth it to exhaust every possible avenue
to find the better solution if you have one that works. The question
becomes whether you have the resources (time and/or money) to search for the
better solution now or whether you should just use the immediately viable
solution until it breaks.

David J.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Perry Smith
Sent: Thursday, April 14, 2011 7:29 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Normalize or not?

I hope this is reasonably easy to follow. I'm looking forward to your
thoughts and comments.

Thank you,
Perry Smith

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general