Aggregation question

Started by Dilip Angalalmost 22 years ago4 messages
#1Dilip Angal
dilip_angal@yahoo.com

Hi

I have a situation that I need flexible number columns to model the business requirements. It could go up to 1000 columns and will be a sparse matrix. One option I was considering was to have a table with col1......col1000
Other option I can consider is store all of them as name values in single column as a string

option 1
col1 | col2 |col 3 |........|col56|.....|col77|
10 | 2 | | | 4 | | |
2 | | 4 | | |.... |6 |

option 2

"col1=10,col2=2,col56=4"
"col1=2,col3=4,col77=6"

I will have about 50Million such records in a table.

I need to aggregate the column values for a given day by taking sum of all the rows

"col1=12,col2=2,col3=4,col56=4,col77=6"

Second option looks very elegant but aggregation is hard
Besides, second option may have performance penalty (Especially when you have 50 Million records)

Any one can give any pointers or comments on how to model this an how to aggregate it?

Thanks
Dilip

#2Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Dilip Angal (#1)
Re: Aggregation question

if you want to put 1000 columns into one table, your data structure
needs some further investigation. you are trying to solve the wrong problem.

Regards,

Hans

Dilip Angal wrote:

Hi

I have a situation that I need flexible number columns to model the
business requirements. It could go up to 1000 columns and will be a
sparse matrix. One option I was considering was to have a table with
col1......col1000
Other option I can consider is store all of them as name values in
single column as a string

option 1
col1 | col2 |col 3 |........|col56|.....|col77|
10 | 2 | | | 4 | | |
2 | | 4 | | |.... |6 |

option 2

"col1=10,col2=2,col56=4"
"col1=2,col3=4,col77=6"

I will have about 50Million such records in a table.

I need to aggregate the column values for a given day by taking sum of
all the rows

"col1=12,col2=2,col3=4,col56=4,col77=6"

Second option looks very elegant but aggregation is hard
Besides, second option may have performance penalty (Especially when you
have 50 Million records)

Any one can give any pointers or comments on how to model this an how to
aggregate it?

Thanks
Dilip

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

#3Bruno Wolff III
bruno@wolff.to
In reply to: Dilip Angal (#1)
Re: [HACKERS] Aggregation question

On Tue, Feb 03, 2004 at 21:27:48 -0800,
Dilip Angal <dilip_angal@yahoo.com> wrote:

Hi

I have a situation that I need flexible number columns to model the business requirements. It could go up to 1000 columns and will be a sparse matrix. One option I was considering was to have a table with col1......col1000

I don't think this is really a hackers' topic. Probably it belongs on general,
though with the sizes you are talking about performance might be appropiate.

I would think that the natural way to reprensent a sparse matrix would be
to use a three column table with row number, column number, and cell value.
Doing aggregates on a column number will be easy.

If the data isn't really a matrix and the different cells are different
data types, you may be able to store all the values as strings and convert
to the appropiate type as needed.

Show quoted text

Other option I can consider is store all of them as name values in single column as a string

option 1
col1 | col2 |col 3 |........|col56|.....|col77|
10 | 2 | | | 4 | | |
2 | | 4 | | |.... |6 |

option 2

"col1=10,col2=2,col56=4"
"col1=2,col3=4,col77=6"

I will have about 50Million such records in a table.

I need to aggregate the column values for a given day by taking sum of all the rows

"col1=12,col2=2,col3=4,col56=4,col77=6"

Second option looks very elegant but aggregation is hard
Besides, second option may have performance penalty (Especially when you have 50 Million records)

Any one can give any pointers or comments on how to model this an how to aggregate it?

Thanks
Dilip

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Hans-Jürgen Schönig (#2)
Re: Aggregation question

On Sat, 7 Feb 2004, [ISO-8859-1] Hans-J?rgen Sch?nig wrote:

if you want to put 1000 columns into one table, your data structure
needs some further investigation. you are trying to solve the wrong problem.

Dilip,

you may try our contrib/hstore from http://www.sai.msu.su/~megera/postgres/gist/
which is sort of perl hash and could provide you some flexibility.
But you may indeed need to think if you really needed such structure :)

Oleg

Regards,

Hans

Dilip Angal wrote:

Hi

I have a situation that I need flexible number columns to model the
business requirements. It could go up to 1000 columns and will be a
sparse matrix. One option I was considering was to have a table with
col1......col1000
Other option I can consider is store all of them as name values in
single column as a string

option 1
col1 | col2 |col 3 |........|col56|.....|col77|
10 | 2 | | | 4 | | |
2 | | 4 | | |.... |6 |

option 2

"col1=10,col2=2,col56=4"
"col1=2,col3=4,col77=6"

I will have about 50Million such records in a table.

I need to aggregate the column values for a given day by taking sum of
all the rows

"col1=12,col2=2,col3=4,col56=4,col77=6"

Second option looks very elegant but aggregation is hard
Besides, second option may have performance penalty (Especially when you
have 50 Million records)

Any one can give any pointers or comments on how to model this an how to
aggregate it?

Thanks
Dilip

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83