Shall I apply normalization in the following case?
For example, for the following table,
measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width
If I normalize to the following format, I will encounter several problem compared to table without normalization
measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1
measurement_unit_id
===================
id | value
----------
1 | mm
measurement_type_id
===================
id | value
----------
1 | width
(1) When rows grow to few millions in table measurement, the join operation on normalization table, is *much* slower compared to non-normalization table.
One of the most frequent used query, To retrieve "value", "measurement_unit" and "measurement_type", I need to join measurement + measurement_unit_id + measurement_type_id.
For non-normalization table, I need NOT to join.
Right now, I cannot justify why I need to use normalization. I afraid I miss out several important points when I turn into un-normalize solution. Please advice if you realize I miss out several important points.
Thanks
Yan Cheng CHEOK
You can always normalize and not use an artificial key.
You'd end up with:
measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm widthmeasurement_unit_id
===================
value
----------
mm
m
cm
in
cm²
m³
measurement_type_id
===================
value
----------
width
area
depth
volume
And so on. You'd benefit from a normalized structure, you'd have
constraints checking for valid units and types and you wouldn't need join to
get the resulting information.
--
Jorge Godoy <jgodoy@gmail.com>
On Wed, Feb 3, 2010 at 23:20, Yan Cheng Cheok <yccheok@yahoo.com> wrote:
Show quoted text
For example, for the following table,
measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm widthIf I normalize to the following format, I will encounter several problem
compared to table without normalizationmeasurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1measurement_unit_id
===================
id | value
----------
1 | mmmeasurement_type_id
===================
id | value
----------
1 | width(1) When rows grow to few millions in table measurement, the join operation
on normalization table, is *much* slower compared to non-normalization
table.One of the most frequent used query, To retrieve "value",
"measurement_unit" and "measurement_type", I need to join measurement +
measurement_unit_id + measurement_type_id.For non-normalization table, I need NOT to join.
Right now, I cannot justify why I need to use normalization. I afraid I
miss out several important points when I turn into un-normalize solution.
Please advice if you realize I miss out several important points.Thanks
Yan Cheng CHEOK--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
For example, for the following table,
measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm widthIf I normalize to the following format, I will encounter several problem compared to table without normalization
measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1measurement_unit_id
===================
id | value
----------
1 | mmmeasurement_type_id
===================
id | value
----------
1 | width
1) foreign key constraints are important, so you don't have things
misspelled or spelled differently and to define the "official" value.
2) querying on an int is quicker then querying on a string, so if you
query on the values without the join you will have better performance.
3) You might want to have more information in the other tables one day,
such as unit conversion information or descriptions, etc..
4) depending on the size of the string, it might take less space for an
int. Though a varchar with mm only takes 3 bytes, width takes 6 bytes,
while a regular int takes 4.
5) As Jorge mentioned you can make the value your pk instead of a serial
int and then you have it normalized and readable.
For the specific design that you are showing, there is no real benefit
to normalization, other then it would make it more scalable.
Sim
Sim Zacks wrote:
2) querying on an int is quicker then querying on a string, so if you
query on the values without the join you will have better performance.
The point of Jorge's approach is that it allows a query without a join.
With the OP's normalization, using integer keys, a join was required,
obviating any advantage to the integer key.
The assertion that a query involving integer keys is always faster than one
with string keys is unsafe at best. The first rule of performance
optimization is that optimizations don't.
You have to test and measure to find out when and if they do.
Misuse of surrogate keys to obtain optimization is a prime example. I've seen
situations in the field several times when such abuse forces extra joins in
queries and an increase in the number of indexes to maintain. Without
measurement under representative workloads, especially concurrent activity,
it's impossible to know whether the cost outweighs the benefit.
--
Lew
On Wed, 2010-02-03 at 17:20 -0800, Yan Cheng Cheok wrote:
For example, for the following table,
measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width
That looks normalized to me. Can you describe the redundancy that I
don't see? Do units of "mm" always imply "width" (I would think they
might also be used for "length").
Regards,
Jeff Davis
On Wed, 2010-02-03 at 23:56 -0200, Jorge Godoy wrote:
You can always normalize and not use an artificial key.
You'd end up with:
measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width
How is this different? It looks like you just added some FKs, which
don't change the normal form.
Regards,
Jeff Davis