Shall I apply normalization in the following case?

Started by Yan Cheng Cheokabout 16 years ago6 messagesgeneral
Jump to latest
#1Yan Cheng Cheok
yccheok@yahoo.com

For example, for the following table,

measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width

If I normalize to the following format, I will encounter several problem compared to table without normalization

measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1

measurement_unit_id
===================
id | value
----------
1 | mm

measurement_type_id
===================
id | value
----------
1 | width

(1) When rows grow to few millions in table measurement, the join operation on normalization table, is *much* slower compared to non-normalization table.

One of the most frequent used query, To retrieve "value", "measurement_unit" and "measurement_type", I need to join measurement + measurement_unit_id + measurement_type_id.

For non-normalization table, I need NOT to join.

Right now, I cannot justify why I need to use normalization. I afraid I miss out several important points when I turn into un-normalize solution. Please advice if you realize I miss out several important points.

Thanks
Yan Cheng CHEOK

#2Jorge Godoy
jgodoy@gmail.com
In reply to: Yan Cheng Cheok (#1)
Re: Shall I apply normalization in the following case?

You can always normalize and not use an artificial key.

You'd end up with:

measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------

1 0.23 mm width
2 0.38 mm width
2 0.72 mm width

measurement_unit_id
===================
value
----------
mm

m
cm
in
cm²

measurement_type_id
===================
value
----------
width

area
depth
volume

And so on. You'd benefit from a normalized structure, you'd have
constraints checking for valid units and types and you wouldn't need join to
get the resulting information.

--
Jorge Godoy <jgodoy@gmail.com>

On Wed, Feb 3, 2010 at 23:20, Yan Cheng Cheok <yccheok@yahoo.com> wrote:

Show quoted text

For example, for the following table,

measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width

If I normalize to the following format, I will encounter several problem
compared to table without normalization

measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1

measurement_unit_id
===================
id | value
----------
1 | mm

measurement_type_id
===================
id | value
----------
1 | width

(1) When rows grow to few millions in table measurement, the join operation
on normalization table, is *much* slower compared to non-normalization
table.

One of the most frequent used query, To retrieve "value",
"measurement_unit" and "measurement_type", I need to join measurement +
measurement_unit_id + measurement_type_id.

For non-normalization table, I need NOT to join.

Right now, I cannot justify why I need to use normalization. I afraid I
miss out several important points when I turn into un-normalize solution.
Please advice if you realize I miss out several important points.

Thanks
Yan Cheng CHEOK

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Sim Zacks
sim@compulab.co.il
In reply to: Yan Cheng Cheok (#1)
Re: Shall I apply normalization in the following case?

For example, for the following table,

measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width

If I normalize to the following format, I will encounter several problem compared to table without normalization

measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1

measurement_unit_id
===================
id | value
----------
1 | mm

measurement_type_id
===================
id | value
----------
1 | width

1) foreign key constraints are important, so you don't have things
misspelled or spelled differently and to define the "official" value.
2) querying on an int is quicker then querying on a string, so if you
query on the values without the join you will have better performance.
3) You might want to have more information in the other tables one day,
such as unit conversion information or descriptions, etc..
4) depending on the size of the string, it might take less space for an
int. Though a varchar with mm only takes 3 bytes, width takes 6 bytes,
while a regular int takes 4.
5) As Jorge mentioned you can make the value your pk instead of a serial
int and then you have it normalized and readable.

For the specific design that you are showing, there is no real benefit
to normalization, other then it would make it more scalable.

Sim

#4Lew
noone@lwsc.ehost-services.com
In reply to: Sim Zacks (#3)
Re: Shall I apply normalization in the following case?

Sim Zacks wrote:

2) querying on an int is quicker then querying on a string, so if you
query on the values without the join you will have better performance.

The point of Jorge's approach is that it allows a query without a join.

With the OP's normalization, using integer keys, a join was required,
obviating any advantage to the integer key.

The assertion that a query involving integer keys is always faster than one
with string keys is unsafe at best. The first rule of performance
optimization is that optimizations don't.

You have to test and measure to find out when and if they do.

Misuse of surrogate keys to obtain optimization is a prime example. I've seen
situations in the field several times when such abuse forces extra joins in
queries and an increase in the number of indexes to maintain. Without
measurement under representative workloads, especially concurrent activity,
it's impossible to know whether the cost outweighs the benefit.

--
Lew

#5Jeff Davis
pgsql@j-davis.com
In reply to: Yan Cheng Cheok (#1)
Re: Shall I apply normalization in the following case?

On Wed, 2010-02-03 at 17:20 -0800, Yan Cheng Cheok wrote:

For example, for the following table,

measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width

That looks normalized to me. Can you describe the redundancy that I
don't see? Do units of "mm" always imply "width" (I would think they
might also be used for "length").

Regards,
Jeff Davis

#6Jeff Davis
pgsql@j-davis.com
In reply to: Jorge Godoy (#2)
Re: Shall I apply normalization in the following case?

On Wed, 2010-02-03 at 23:56 -0200, Jorge Godoy wrote:

You can always normalize and not use an artificial key.

You'd end up with:

measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width

How is this different? It looks like you just added some FKs, which
don't change the normal form.

Regards,
Jeff Davis