cube operations
Hi,
I have a array column which has 12 real values in it. Basically these
values represent co-ordinates in 12 dimensions for a substance. My main
need is to find substances similar to a particular compound. Now I can
do by calculating differences with each array in the whole table. But
the table has millions of rows. So I need some kinda higher dimensional
index. I have read about the cube operation in postgre, can it be
extended to 12 dimensions or something like that.
Thanks
Abhang
2007/5/16, ABHANG RANE <arane@indiana.edu>:
Hi,
I have a array column which has 12 real values in it. Basically these
values represent co-ordinates in 12 dimensions for a substance. My main
need is to find substances similar to a particular compound. Now I can
do by calculating differences with each array in the whole table. But
the table has millions of rows. So I need some kinda higher dimensional
index. I have read about the cube operation in postgre, can it be
extended to 12 dimensions or something like that.
Don't know if this helps, but have a look at intarray:
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/intarray/
If you feel brave you could take this code and try to write some
proximity- or similarity-checking functions in C to speedup the
calculations.
Also consider representing values by integers, since integer
operations are much faster.
--
Filip Rembiałkowski
ABHANG RANE wrote:
I have a array column which has 12 real values in it. Basically
these values represent co-ordinates in 12 dimensions for a
substance. My main need is to find substances similar to a
particular compound. Now I can do by calculating differences with
each array in the whole table. But the table has millions of rows.
So I need some kinda higher dimensional index.
Is there any particular reason you're using an array? If every row
has all twelve values, I'd just make them columns. Then I could use
a multi-column index.
I have read about the cube operation in postgre, can it be extended
to 12 dimensions or something like that.
I have no experience with CUBE, but I think it's just a kind of
summarization aggregate.
It sounds like you want the Nearest Neighbor(s) of your "particular
compound". You might to read about that:
http://en.wikipedia.org/wiki/Nearest_neighbor_search
- John Burger
G63
hacking contrib/intarray could help you. You need to add function which
return the number of overlapped elements.
Oleg
On Wed, 16 May 2007, John D. Burger wrote:
ABHANG RANE wrote:
I have a array column which has 12 real values in it. Basically these
values represent co-ordinates in 12 dimensions for a substance. My main
need is to find substances similar to a particular compound. Now I can do
by calculating differences with each array in the whole table. But the
table has millions of rows. So I need some kinda higher dimensional index.Is there any particular reason you're using an array? If every row has all
twelve values, I'd just make them columns. Then I could use a multi-column
index.I have read about the cube operation in postgre, can it be extended to 12
dimensions or something like that.I have no experience with CUBE, but I think it's just a kind of summarization
aggregate.It sounds like you want the Nearest Neighbor(s) of your "particular
compound". You might to read about that:http://en.wikipedia.org/wiki/Nearest_neighbor_search
- John Burger
G63---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Hi,
But now having 12 columns and multicolumn index, wont this slow down
the search process. I mean in general retrieving 12 columns using a
multicolumn index is slower or faster compared to an index on a 12 size
array?
Thanks
Abhang
Quoting "John D. Burger" <john@mitre.org>:
Show quoted text
ABHANG RANE wrote:
I have a array column which has 12 real values in it. Basically
these values represent co-ordinates in 12 dimensions for a
substance. My main need is to find substances similar to a
particular compound. Now I can do by calculating differences with
each array in the whole table. But the table has millions of rows.
So I need some kinda higher dimensional index.Is there any particular reason you're using an array? If every row
has all twelve values, I'd just make them columns. Then I could use
a multi-column index.I have read about the cube operation in postgre, can it be extended
to 12 dimensions or something like that.I have no experience with CUBE, but I think it's just a kind of
summarization aggregate.It sounds like you want the Nearest Neighbor(s) of your "particular
compound". You might to read about that:http://en.wikipedia.org/wiki/Nearest_neighbor_search
- John Burger
G63---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly