Need advise for database structure for non linear data.

Started by Andre Lopesover 15 years ago9 messagesgeneral

lopes80andre@gmail.com

over 15 years ago

Hi,

I need advise about a database structure. I need to capture data from the
web about one specific subject on few specific websites and insert that data
to a database. I have done this question here before, but I think I have not
explained very well.

The problem with this task is that the information is not linear, if I try
to design tables with fields for all possible data I will end up with many
row fields with NULL values. There are any problem with this(end up with
many row fields with NULL values)? Or should I user other kind of structure?
For example store the data in one field and that field containing an
associative array with data.

What I mean with non linear data is the following:

array(
'name' => 'Don',
'age' => '31'
);

array(
'name' => 'Peter',
'age' => '28',
'car' => 'ford',
'km' => '2000'
);

In a specific website search I will store only "name" and "age", and in
other website I will store "name", "age", "car" and "km".

I don't know If I explain weel my problem. My english is not very good.

Best Regards.

Thomas Schmidt

postgres@stephan.homeunix.net

over 15 years ago

In reply to: Andre Lopes (#1)

Re: Need advise for database structure for non linear data.

Hello,

Am 03.01.11 12:11, schrieb Andre Lopes:

Hi,

I need advise about a database structure. I need to capture data from the
web about one specific subject on few specific websites and insert that data
to a database. I have done this question here before, but I think I have not
explained very well.

What I mean with non linear data is the following:

array(
'name' => 'Don',
'age' => '31'
);

array(
'name' => 'Peter',
'age' => '28',
'car' => 'ford',
'km' => '2000'
);

In a specific website search I will store only "name" and "age", and in
other website I will store "name", "age", "car" and "km".

I don't know If I explain weel my problem. My english is not very good.

In theory, using a single table having three columns
(array-id,key,value) will suit your needs.
However, providing a simple key/value store is not the idea behind DBMS
like postgres ...
See:
http://en.wikipedia.org/wiki/NoSQL
http://en.wikipedia.org/wiki/Relational_database_management_system

Thomas

Thomas Kellerer

spam_eater@gmx.net

over 15 years ago

In reply to: Andre Lopes (#1)

Re: Need advise for database structure for non linear data.

Andre Lopes wrote on 03.01.2011 12:11:

array(
'name' => 'Don',
'age' => '31'
);

array(
'name' => 'Peter',
'age' => '28',
'car' => 'ford',
'km' => '2000'
);

In a specific website search I will store only "name" and "age", and
in other website I will store "name", "age", "car" and "km".

I don't know If I explain weel my problem. My english is not very
good.

That's exactly what the hstore data type supports:

http://www.postgresql.org/docs/current/static/hstore.html

Regards
Thomas

Radosław Smogura

rsmogura@softperience.eu

over 15 years ago

In reply to: Thomas Schmidt (#2)

Re: Need advise for database structure for non linear data.

I can propose you something like this:

website(id int, url varchar);
attr_def (id int, name varchar);
attr_val (id int, def_id reference attr_def.id, website_id int
references website.id, value varchar);
If all of your attributes in website are single valued then you can
remove id from attr_val and use PK from website_id, def_id.

Depending on your needs one or many from following indexes:
attr_val(value) - search for attributes with value;
attr_val(def_id, value) - search for given attributes with values;
attr_val(website_id, def_id, value) - checks, if given site has
attribue, search by values for given site and attribute;
attr_val(def_id, website_id) - like above, without value searching;
attr_val(website_id, value) - search for attributes on given site with
value.
Probably you will use 2nd or 3rd index.

Example of search on website
select d.name, v.value from attre_def d join attr_val v on (v.def_id =
d.id) join website w on (v.website_id = w.id)
where d.name = 'xxxx' and w.url='http://somtehing'

This is common map structure.

Kind regards,
Radosław Smogura

On Mon, 03 Jan 2011 12:26:45 +0100, Thomas Schmidt
<postgres@stephan.homeunix.net> wrote:

Show quoted text

Hello,

Am 03.01.11 12:11, schrieb Andre Lopes:

Hi,

I need advise about a database structure. I need to capture data
from the
web about one specific subject on few specific websites and insert
that data
to a database. I have done this question here before, but I think I
have not
explained very well.

What I mean with non linear data is the following:

array(
'name' => 'Don',
'age' => '31'
);

array(
'name' => 'Peter',
'age' => '28',
'car' => 'ford',
'km' => '2000'
);

In a specific website search I will store only "name" and "age", and
in
other website I will store "name", "age", "car" and "km".

I don't know If I explain weel my problem. My english is not very
good.

In theory, using a single table having three columns
(array-id,key,value) will suit your needs.
However, providing a simple key/value store is not the idea behind
DBMS like postgres ...
See:
http://en.wikipedia.org/wiki/NoSQL
http://en.wikipedia.org/wiki/Relational_database_management_system

Thomas

Thomas Schmidt

postgres@stephan.homeunix.net

over 15 years ago

In reply to: Radosław Smogura (#4)

Re: Need advise for database structure for non linear data.

Hello,

Am 03.01.11 12:46, schrieb Radosław Smogura:

I can propose you something like this:

website(id int, url varchar);
attr_def (id int, name varchar);
attr_val (id int, def_id reference attr_def.id, website_id int
references website.id, value varchar);
If all of your attributes in website are single valued then you can
remove id from attr_val and use PK from website_id, def_id.

Depending on your needs one or many from following indexes:
attr_val(value) - search for attributes with value;

(...)

Probably you will use 2nd or 3rd index.

Example of search on website
select d.name, v.value from attre_def d join attr_val v on (v.def_id =
d.id) join website w on (v.website_id = w.id)
where d.name = 'xxxx' and w.url='http://somtehing'

Imho its hard - (if not impossible) to recommand a specific database
scheme (incl indexes) without knowing the applications taking plance
behind it.
Your schema is nice for specific querying, but might blow up if lots of
data is stored in the database (joins, index-building might be time
consuming).
On the other hand, google put some effort into their "BigTable"
http://en.wikipedia.org/wiki/BigTable for storing tons of data...

Thus - it all depends on the usage :-)

Thomas

Andre Lopes

lopes80andre@gmail.com

over 15 years ago

In reply to: Thomas Schmidt (#5)

Re: Need advise for database structure for non linear data.

Hi,

Thanks for the reply's. I was tempted to accept the Rodoslaw Smogura
proposal. There will be about 100 websites to capture data on daily basis.
Each website adds per day(average) 2 articles.

Thomas talked about the noSQL possibility. What do you think would be
better? I have no experience in noSQL and that could be a weakness.

Best Regards,
André

On Mon, Jan 3, 2011 at 11:58 AM, Thomas Schmidt <
postgres@stephan.homeunix.net> wrote:

Show quoted text

Hello,

Am 03.01.11 12:46, schrieb Radosław Smogura:

I can propose you something like this:

website(id int, url varchar);
attr_def (id int, name varchar);
attr_val (id int, def_id reference attr_def.id, website_id int references
website.id, value varchar);
If all of your attributes in website are single valued then you can remove
id from attr_val and use PK from website_id, def_id.

Depending on your needs one or many from following indexes:
attr_val(value) - search for attributes with value;

(...)

Probably you will use 2nd or 3rd index.

Example of search on website
select d.name, v.value from attre_def d join attr_val v on (v.def_id =
d.id) join website w on (v.website_id = w.id)
where d.name = 'xxxx' and w.url='http://somtehing'

Imho its hard - (if not impossible) to recommand a specific database scheme
(incl indexes) without knowing the applications taking plance behind it.
Your schema is nice for specific querying, but might blow up if lots of
data is stored in the database (joins, index-building might be time
consuming).
On the other hand, google put some effort into their "BigTable"
http://en.wikipedia.org/wiki/BigTable for storing tons of data...

Thus - it all depends on the usage :-)

Thomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Thomas Schmidt

postgres@stephan.homeunix.net

over 15 years ago

In reply to: Andre Lopes (#6)

Re: Need advise for database structure for non linear data.

Hello,

Am 03.01.11 14:14, schrieb Andre Lopes:

Hi,

Thanks for the reply's. I was tempted to accept the Rodoslaw Smogura
proposal. There will be about 100 websites to capture data on daily basis.
Each website adds per day(average) 2 articles.

Thomas talked about the noSQL possibility. What do you think would be
better? I have no experience in noSQL and that could be a weakness.

Imho RDBMS do a very good job in managing data on a relational basis.
However - there are alternatives and use cases for 'em and there is no
holy grail...
Not having any experience is a good point for not using it in production
:-).
However, if you've time to spare, looking into database design
(plain-sql and not-only-sql) will help.

I don't think that you get in trouble with a few hundered rows per day,
but keep in mind, what queries are used.

Thomas

Fredric Fredricson

Fredric.Fredricson@bonetmail.com

over 15 years ago

In reply to: Andre Lopes (#1)

Re: Need advise for database structure for non linear data.

On 01/03/2011 12:11 PM, Andre Lopes wrote:

[snip]
The problem with this task is that the information is not linear, if I
try to design tables with fields for all possible data I will end up
with many row fields with NULL values. There are any problem with
this(end up with many row fields with NULL values)? Or should I user
other kind of structure? For example store the data in one field and
that field containing an associative array with data.

As far as I understand NULL values are not really stored and a column
with many NULLs is not a problem as such, but if it is part of an index
the index might not be very useful.

At least that's my understanding of how SQL databases work. If I got
this wrong I hope someone will correct me.

[snip]

/Fredric

Stefan Keller

sfkeller@gmail.com

over 15 years ago

In reply to: Fredric Fredricson (#8)

Re: Need advise for database structure for non linear data.

Andre,

From a distant view of your problem I would like to vote for Thomas
Kellerer's proposal:
Maintain only the data you need (to enhance import/sync performance)
and use the hstore data type (as long as query performance is ok).

Yours, S.

2011/1/3 Fredric Fredricson <Fredric.Fredricson@bonetmail.com>:

Show quoted text

On 01/03/2011 12:11 PM, Andre Lopes wrote:

[snip]
The problem with this task is that the information is not linear, if I try
to design tables with fields for all possible data I will end up with many
row fields with NULL values. There are any problem with this(end up with
many row fields with NULL values)? Or should I user other kind of structure?
For example store the data in one field and that field containing an
associative array with data.

As far as I understand NULL values are not really stored and a column with
many NULLs is not a problem as such, but if it is part of an index the index
might not be very useful.

At least that's my understanding of how SQL databases work. If I got this
wrong I hope someone will correct me.

[snip]

/Fredric

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general