redundant fields in table for "performance optimizations"
Let's say you have a father-child (or master-detail if you wish) hierarchy
of tables of not just 2 levels, but, say, 5 levels.
E.g. tables A, B, C, D and E organized in successive 1-to-N relationships:
A ----1-to-N-----> B
B ----1-to-N-----> C
C ----1-to-N-----> D
D ----1-to-N-----> E
with appropriate foreign keys:
* from E to D
* from D to C
* from C to B
* from B to A
This is normalized so far. Now assume that it is the case than in some
queries on table E you also need to report a field that only exists on
table A. This will mean a JOIN between five tables: E, D, C, B and A. Some
questions follow:
[1]: assuming tables having a number of rows in the order of 100,000, after how many levels of depth would you feel justified to depart from the normalized schema and introduce some redundancy to speed up the queries?
how many levels of depth would you feel justified to depart from the
normalized schema and introduce some redundancy to speed up the queries?
[2]: is adding redundant fields and extra foreign keys (say directly from E to A) the best way to do this in 2012? Shouldn't some indexing and fine tuning suffice ?
to A) the best way to do this in 2012? Shouldn't some indexing and fine
tuning suffice ?
[3]: do you feel this is a legitimate concern in a modern PostgreSQL database running on high end (200,000 USD) hardware and serving no more than 1000 concurrent users with table sizes at the lowest (more detailed) level of the hierarchy in the order of a few tens of millions of rows at the most and dropping by a factor of 20 for each level up ?
database running on high end (200,000 USD) hardware and serving no more
than 1000 concurrent users with table sizes at the lowest (more detailed)
level of the hierarchy in the order of a few tens of millions of rows at
the most and dropping by a factor of 20 for each level up ?
Menelaos.
Depending on your problem domain, it might make sense to have multi-column
primary keys in some non-A tables, where a subset of their columns are the
foreign keys to the parents. In that case, you can skip some intermediary
tables in the joins. However, this would increase the size of your indexes and
slow updates.
A relevant question is whether these 5 tables all are self-similar or whether
each one is differently structured.
-- Darren Duncan
Menelaos PerdikeasSemantix wrote:
Show quoted text
Let's say you have a father-child (or master-detail if you wish)
hierarchy of tables of not just 2 levels, but, say, 5 levels.
E.g. tables A, B, C, D and E organized in successive 1-to-N relationships:A ----1-to-N-----> B
B ----1-to-N-----> C
C ----1-to-N-----> D
D ----1-to-N-----> Ewith appropriate foreign keys:
* from E to D
* from D to C
* from C to B
* from B to AThis is normalized so far. Now assume that it is the case than in some
queries on table E you also need to report a field that only exists on
table A. This will mean a JOIN between five tables: E, D, C, B and A.
Some questions follow:[1] assuming tables having a number of rows in the order of 100,000,
after how many levels of depth would you feel justified to depart from
the normalized schema and introduce some redundancy to speed up the queries?[2] is adding redundant fields and extra foreign keys (say directly from
E to A) the best way to do this in 2012? Shouldn't some indexing and
fine tuning suffice ?[3] do you feel this is a legitimate concern in a modern PostgreSQL
database running on high end (200,000 USD) hardware and serving no more
than 1000 concurrent users with table sizes at the lowest (more
detailed) level of the hierarchy in the order of a few tens of millions
of rows at the most and dropping by a factor of 20 for each level up ?Menelaos.
Hi,
On 22 August 2012 07:07, Menelaos PerdikeasSemantix
<mperdikeas.semantix@gmail.com> wrote:
Let's say you have a father-child (or master-detail if you wish) hierarchy
of tables of not just 2 levels, but, say, 5 levels.
E.g. tables A, B, C, D and E organized in successive 1-to-N relationships:A ----1-to-N-----> B
B ----1-to-N-----> C
C ----1-to-N-----> D
D ----1-to-N-----> Ewith appropriate foreign keys:
* from E to D
* from D to C
* from C to B
* from B to AThis is normalized so far. Now assume that it is the case than in some
queries on table E you also need to report a field that only exists on table
A. This will mean a JOIN between five tables: E, D, C, B and A. Some
questions follow:[1] assuming tables having a number of rows in the order of 100,000, after
how many levels of depth would you feel justified to depart from the
normalized schema and introduce some redundancy to speed up the queries?[3] do you feel this is a legitimate concern in a modern PostgreSQL database
running on high end (200,000 USD) hardware and serving no more than 1000
concurrent users with table sizes at the lowest (more detailed) level of the
hierarchy in the order of a few tens of millions of rows at the most and
dropping by a factor of 20 for each level up ?
I would ask different question(s): how "static" that tree structure is
and what kind of queries do you want to run:
- father-child: easy to understand; add new node; change leaf node;
hard to run some count(*) queries; and get hierarchy (CTEs are help
full)
- nested sets: pailful to move nodes around (even add new node); easy
to get tree subsets; ...
Anyway, I've found this summary:
http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database
when I was googling for Joe Celko's Trees and Hierarchies book.
--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)