cube operations

Started by ABHANG RANEalmost 19 years ago5 messagesgeneral

arane@indiana.edu

almost 19 years ago

Hi,
I have a array column which has 12 real values in it. Basically these
values represent co-ordinates in 12 dimensions for a substance. My main
need is to find substances similar to a particular compound. Now I can
do by calculating differences with each array in the whole table. But
the table has millions of rows. So I need some kinda higher dimensional
index. I have read about the cube operation in postgre, can it be
extended to 12 dimensions or something like that.

Thanks
Abhang

Filip Rembiałkowski

plk.zuber@gmail.com

almost 19 years ago

In reply to: ABHANG RANE (#1)

Re: cube operations

2007/5/16, ABHANG RANE <arane@indiana.edu>:

Hi,
I have a array column which has 12 real values in it. Basically these
values represent co-ordinates in 12 dimensions for a substance. My main
need is to find substances similar to a particular compound. Now I can
do by calculating differences with each array in the whole table. But
the table has millions of rows. So I need some kinda higher dimensional
index. I have read about the cube operation in postgre, can it be
extended to 12 dimensions or something like that.

Don't know if this helps, but have a look at intarray:
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/intarray/
If you feel brave you could take this code and try to write some
proximity- or similarity-checking functions in C to speedup the
calculations.

Also consider representing values by integers, since integer
operations are much faster.

--
Filip Rembiałkowski

John D. Burger

john@mitre.org

almost 19 years ago

In reply to: ABHANG RANE (#1)

Re: cube operations

ABHANG RANE wrote:

I have a array column which has 12 real values in it. Basically
these values represent co-ordinates in 12 dimensions for a
substance. My main need is to find substances similar to a
particular compound. Now I can do by calculating differences with
each array in the whole table. But the table has millions of rows.
So I need some kinda higher dimensional index.

Is there any particular reason you're using an array? If every row
has all twelve values, I'd just make them columns. Then I could use
a multi-column index.

I have read about the cube operation in postgre, can it be extended
to 12 dimensions or something like that.

I have no experience with CUBE, but I think it's just a kind of
summarization aggregate.

It sounds like you want the Nearest Neighbor(s) of your "particular
compound". You might to read about that:

http://en.wikipedia.org/wiki/Nearest_neighbor_search

- John Burger
G63

Oleg Bartunov

oleg@sai.msu.su

almost 19 years ago

In reply to: John D. Burger (#3)

Re: cube operations

hacking contrib/intarray could help you. You need to add function which
return the number of overlapped elements.

Oleg

On Wed, 16 May 2007, John D. Burger wrote:

ABHANG RANE wrote:

I have a array column which has 12 real values in it. Basically these
values represent co-ordinates in 12 dimensions for a substance. My main
need is to find substances similar to a particular compound. Now I can do
by calculating differences with each array in the whole table. But the
table has millions of rows. So I need some kinda higher dimensional index.

Is there any particular reason you're using an array? If every row has all
twelve values, I'd just make them columns. Then I could use a multi-column
index.

I have read about the cube operation in postgre, can it be extended to 12
dimensions or something like that.

I have no experience with CUBE, but I think it's just a kind of summarization
aggregate.

It sounds like you want the Nearest Neighbor(s) of your "particular
compound". You might to read about that:

http://en.wikipedia.org/wiki/Nearest_neighbor_search

- John Burger
G63

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

ABHANG RANE

arane@indiana.edu

almost 19 years ago

In reply to: John D. Burger (#3)

Re: cube operations

Hi,
But now having 12 columns and multicolumn index, wont this slow down
the search process. I mean in general retrieving 12 columns using a
multicolumn index is slower or faster compared to an index on a 12 size
array?

Thanks
Abhang
Quoting "John D. Burger" <john@mitre.org>:

Show quoted text

ABHANG RANE wrote:

I have a array column which has 12 real values in it. Basically
these values represent co-ordinates in 12 dimensions for a
substance. My main need is to find substances similar to a
particular compound. Now I can do by calculating differences with
each array in the whole table. But the table has millions of rows.
So I need some kinda higher dimensional index.

Is there any particular reason you're using an array? If every row
has all twelve values, I'd just make them columns. Then I could use
a multi-column index.

I have read about the cube operation in postgre, can it be extended
to 12 dimensions or something like that.

I have no experience with CUBE, but I think it's just a kind of
summarization aggregate.

It sounds like you want the Nearest Neighbor(s) of your "particular
compound". You might to read about that:

http://en.wikipedia.org/wiki/Nearest_neighbor_search

- John Burger
G63

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly