arrays of floating point numbers / linear algebra operations into the DB

Started by Enrico Sirolaabout 18 years ago18 messagesgeneral
Jump to latest
#1Enrico Sirola
enrico.sirola@gmail.com

Hello,
I'd like to perform linear algebra operations on float4/8 arrays.
These tasks are tipically carried on using ad hoc optimized libraries
(e.g. BLAS). In order to do this, I studied a bit how arrays are
stored internally by the DB: from what I understood, arrays are
basically a vector of Datum, and floating point numbers are stored by
reference into Datums. At a first glance, this seem to close the
discussion because in order to perform fast linear algebra operations,
you need to store array items in consecutive memory cells.
What are the alternatives? Create a new specialized data type for
floating point vectors?
Basically, the use-case is to be able to rescale, add and multiply
(element-by-element)
vectors.

Thanks for your help,
e.

#2Colin Wetherbee
cww@denterprises.org
In reply to: Enrico Sirola (#1)
Re: arrays of floating point numbers / linear algebra operations into the DB

Enrico Sirola wrote:

Hello,
I'd like to perform linear algebra operations on float4/8 arrays. These
tasks are tipically carried on using ad hoc optimized libraries (e.g.
BLAS). In order to do this, I studied a bit how arrays are stored
internally by the DB: from what I understood, arrays are basically a
vector of Datum, and floating point numbers are stored by reference into
Datums. At a first glance, this seem to close the discussion because in
order to perform fast linear algebra operations, you need to store array
items in consecutive memory cells.
What are the alternatives? Create a new specialized data type for
floating point vectors?
Basically, the use-case is to be able to rescale, add and multiply
(element-by-element)
vectors.

I'm not sure about the internals of PostgreSQL (eg. the Datum object(?)
you mention), but if you're just scaling vectors, consecutive memory
addresses shouldn't be absolutely necessary. Add and multiply
operations within a linked list (which is how I'm naively assuming Datum
storage for arrays in memory is implemented) will be "roughly" just as fast.

How many scaling operations are you planning to execute per second, and
how many elements do you scale per operation?

Colin

#3Martijn van Oosterhout
kleptog@svana.org
In reply to: Enrico Sirola (#1)
Re: arrays of floating point numbers / linear algebra operations into the DB

On Fri, Feb 01, 2008 at 11:31:37AM +0100, Enrico Sirola wrote:

Hello,
I'd like to perform linear algebra operations on float4/8 arrays.
These tasks are tipically carried on using ad hoc optimized libraries
(e.g. BLAS). In order to do this, I studied a bit how arrays are
stored internally by the DB: from what I understood, arrays are
basically a vector of Datum, and floating point numbers are stored by
reference into Datums.

Well, arrays are not vectors of Datum, they are a vector of the objects
they contain. When passed to a function floats, arrays and other by-ref
types as passed by reference, but the array object itself does not
contain references, it contains the actual objects.

That doesn't necessarily make it the same as a C array though, the
alignment considerations may be different. But at first glance
certainly seems like an array would be in the right format for what
you're doing.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy

#4Enrico Sirola
enrico.sirola@gmail.com
In reply to: Colin Wetherbee (#2)
Re: arrays of floating point numbers / linear algebra operations into the DB

Hi Colin,

Il giorno 01/feb/08, alle ore 15:22, Colin Wetherbee ha scritto:

I'm not sure about the internals of PostgreSQL (eg. the Datum
object(?) you mention), but if you're just scaling vectors,
consecutive memory addresses shouldn't be absolutely necessary. Add
and multiply operations within a linked list (which is how I'm
naively assuming Datum storage for arrays in memory is implemented)
will be "roughly" just as fast.

I'm not an expert, anyway the SSE instructions family should make the
difference when performing this kind of workload, and those
instructions work on consecutive memory cells.

How many scaling operations are you planning to execute per second,
and how many elements do you scale per operation?

typically, arrays contain 1000 elements, and an operation is either
multiply it by a scalar or multiply it element-by-element with another
array. The time to rescale 1000 arrays, multiply it for another array
and at the end sum all the 1000 resulting arrays should be enough to
be carried on in an interactive application (let's say 0.5s). This, in
the case when no disk-access is required. Disk access will obviously
downgrade performances a bit ad the beginning, but the workload is
mostly read-only so after a while the whole table will be cached
anyway. The table containing the arrays would be truncated/repopulated
every day and the number of arrays is expected to be more or less
150000 (at least this is what we have now). Nowadays, we have a c++
middleware between the calculations and an aggressive caching of the
table contents (and we don't use arrays, just a row per element) but
the application could be refactored (and simplified a lot) if we have
a smart way to save data into the DB.
Bye,
e.

#5Webb Sprague
webb.sprague@gmail.com
In reply to: Enrico Sirola (#1)
Re: arrays of floating point numbers / linear algebra operations into the DB

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:

Hello,
I'd like to perform linear algebra operations on float4/8 arrays.
These tasks are tipically carried on using ad hoc optimized libraries
(e.g. BLAS).

If there were a coherently designed, simple, and fast LAPACK/ MATLAB
style library and set of datatypes for matrices and vectors in
Postgres, I think that would be a HUGE plus for the project!

I would have used it on a project I am working on in mortality
forecasting (I would have been able to put all of my mathematics in
the database instead of using scipy), it would tie in beautifully with
the GIS and imagery efforts, it would ease fancy statistics
calculation on database infrastructure, it would provide useful
libraries for the datamining/ knowledge discovery types, etc, etc. If
we just had fast matrix arithmetic, eigen-stuff (including singular
value decomposition), convolution, random matrix generation, and
table <-> matrix functions, that would be amazing and would provide
the material for further library development since a lot of complex
algorithms just fall out when you can do advanced linear algebra.

We need to be able to convert transparently between matrices/ vectors
(which I think should be simple N by 1 matrices by default) and
arrays, but we would probably want to go for a separate datatype in
order to get speed since scientifically important matrices can be
HUGE.

Just my fairly worthless $0.02, as I all I would provide would be to
be a tester and member of the peanut-gallery, but there you go.
Seems like a perfect Summer Of Code project for someone better at
C-level programming than me.

-W

#6Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Webb Sprague (#5)
Re: arrays of floating point numbers / linear algebra operations into the DB

Webb Sprague wrote:

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:

I'd like to perform linear algebra operations on float4/8 arrays...

If there were a coherently designed, simple, and fast LAPACK/ MATLAB
style library and set of datatypes for matrices and vectors in
Postgres, I think that would be a HUGE plus for the project!

I'd also be very excited about this project.

Especially if some GIST or similar index could efficiently search
for vectors "close" to other vectors.

I assume something like "within a n-dimensional bounding box"
would be possible with GIST....

I'd be eager to help, test, debug, etc; but probably aren't qualified
to take the lead on such a project.

#7Webb Sprague
webb.sprague@gmail.com
In reply to: Ron Mayer (#6)
Re: arrays of floating point numbers / linear algebra operations into the DB

(I had meant also to add that a linear algebra package would help
Postgres to be the mediator for real-time data, from things like
temprature sensors, etc, and their relationship to not-so-scientific
data, say in a manufacturing environment).

On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:

Webb Sprague wrote:

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:

I'd like to perform linear algebra operations on float4/8 arrays...

If there were a coherently designed, simple, and fast LAPACK/ MATLAB
style library and set of datatypes for matrices and vectors in
Postgres, I think that would be a HUGE plus for the project!

I'd also be very excited about this project.

Especially if some GIST or similar index could efficiently search
for vectors "close" to other vectors.

That would be very interesting as we could play with a multitude of
different distance metrics from Analysis!!! Wow!

I'd be eager to help, test, debug, etc; but probably aren't qualified
to take the lead on such a project.

I almost think the hardest part would be to spec it out and design the
interface to the libraries. Once we had that, the libraries are
already there, though figuring out how we are going to handle gigabyte
size elements (e.g. a satellite image) will require some finesse, and
perhaps some tiling ...

Hmm. If I get some more interest on this list (I need just one LAPACK
/ BLAS hacker...), I will apply for a pgFoundry project and appoint
myself head of the peanut gallery...

#8Joe Conway
mail@joeconway.com
In reply to: Enrico Sirola (#4)
Re: arrays of floating point numbers / linear algebra operations into the DB

Enrico Sirola wrote:

typically, arrays contain 1000 elements, and an operation is either
multiply it by a scalar or multiply it element-by-element with another
array. The time to rescale 1000 arrays, multiply it for another array
and at the end sum all the 1000 resulting arrays should be enough to be
carried on in an interactive application (let's say 0.5s). This, in the
case when no disk-access is required. Disk access will obviously
downgrade performances a bit ad the beginning, but the workload is
mostly read-only so after a while the whole table will be cached anyway.
The table containing the arrays would be truncated/repopulated every day
and the number of arrays is expected to be more or less 150000 (at least
this is what we have now). Nowadays, we have a c++ middleware between
the calculations and an aggressive caching of the table contents (and we
don't use arrays, just a row per element) but the application could be
refactored (and simplified a lot) if we have a smart way to save data
into the DB.

I don't know if the speed will meet your needs, but you might test to
see if PL/R will work for you:

http://www.joeconway.com/plr/

You could use pg.spi.exec() from within the R procedure to grab the
arrays, do all of your processing inside R (which uses whatever BLAS
you've set it up to use), and then return the result out to Postgres.

Joe

From james at jcalfee.info Fri Feb 1 20:56:11 2008
From: james at jcalfee.info (James Calfee)
Date: Fri, 01 Feb 2008 15:56:11 -0500
Subject: [Pljava-dev] pljava.dll - fixed for me
In-Reply-To: <47A37BC9.6050009@jcalfee.info>
References: <47A37BC9.6050009@jcalfee.info>
Message-ID: <47A3876B.1070107@jcalfee.info>

ERROR: could not load library "C:/Program
Files/PostgreSQL/8.2/lib/pljava.dll"

I HAD TO REBOOT!!!

I restarted the postmaster service but that did nothing to fix the
problem. I noticed that all the postmaster processes disappear when the
service is down.

#9Enrico Sirola
enrico.sirola@gmail.com
In reply to: Enrico Sirola (#1)
Re: arrays of floating point numbers / linear algebra operations into the DB

Hi Joe,

I don't know if the speed will meet your needs, but you might test
to see if PL/R will work for you:

http://www.joeconway.com/plr/

You could use pg.spi.exec() from within the R procedure to grab the
arrays, do all of your processing inside R (which uses whatever BLAS
you've set it up to use), and then return the result out to Postgres.

Thanks a lot for the hint, I'll give it a try. It also will be much
easier to implement a prototype :-)
Bye,
e.

#10Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Webb Sprague (#7)
Re: arrays of floating point numbers / linear algebra operations into the DB

Webb Sprague wrote:

On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:

Webb Sprague wrote:

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:

...linear algebra ...

... matrices and vectors .

...Especially if some GIST or similar index could efficiently search
for vectors "close" to other vectors...

Hmm. If I get some more interest on this list (I need just one LAPACK
/ BLAS hacker...), I will apply for a pgFoundry project and appoint
myself head of the peanut gallery...

I think you should start one. I'd be happy to help.

I'm rather proficient in C; somewhat literate about postgres' GIST
stuff (I think a couple of my bugfix patches were accepted in postgis);
and deal with a big database doing lots of similarity-based searches (a
6'2" guy with light brown hair being similar to a 6'1" guy with dark
blond hair) - and am experimenting with modeling some of the data as
vectors in postgres.

#11Webb Sprague
webb.sprague@gmail.com
In reply to: Ron Mayer (#10)
Re: arrays of floating point numbers / linear algebra operations into the DB

...linear algebra ...

... matrices and vectors .

...Especially if some GIST or similar index could efficiently search
for vectors "close" to other vectors...

Hmm. If I get some more interest on this list (I need just one LAPACK
/ BLAS hacker...), I will apply for a pgFoundry project and appoint
myself head of the peanut gallery...

I think you should start one. I'd be happy to help.

OK. You are on. I think designing an interface is the first step,
and I am inclined to use matlab syntax plus cool things I wish they
had (convolution matrices, recycling, etc).

I'm rather proficient in C; somewhat literate about postgres' GIST
stuff (I think a couple of my bugfix patches were accepted in postgis);

Nifty! I am having trouble bending my head around how we can fit 10K
by 10K matrices into Datums, but if you have worked with PostGIS then
a lot of those big geographic fields might help.

and deal with a big database doing lots of similarity-based searches (a
6'2" guy with light brown hair being similar to a 6'1" guy with dark
blond hair) - and am experimenting with modeling some of the data as
vectors in postgres.

Well, I bet a good linear algebra library would help. A lot. :)

#12Ted Byers
r.ted.byers@rogers.com
In reply to: Webb Sprague (#11)
Re: arrays of floating point numbers / linear algebra operations into the DB
--- Webb Sprague <webb.sprague@gmail.com> wrote:

...linear algebra ...

... matrices and vectors .

...Especially if some GIST or similar index

could efficiently search

for vectors "close" to other vectors...

I see a potential problem here, in terms of how one
defines "close" or similitude. I think, though,
practical answers can be found in examples of applying
quantitative methods in some subdisciplines of
biology.

Hmm. If I get some more interest on this list

(I need just one LAPACK

/ BLAS hacker...), I will apply for a pgFoundry

project and appoint

myself head of the peanut gallery...

Someone pointed to the potential utility of pl/R. I
would be interested at least in learning about your
assessment of the two (postgis and pl/r. Alas, I
don't have decent date I could use to experiment with
either (except possibly for time series analysis,
which is a completely different kettle of fish.

and deal with a big database doing lots of

similarity-based searches (a

6'2" guy with light brown hair being similar to a

6'1" guy with dark

blond hair) - and am experimenting with modeling

some of the data as

vectors in postgres.

Well, I bet a good linear algebra library would
help. A lot. :)

If you're looking at similarity, and some practicality
in the USE of quantitative procedures, you may want to
look into the biogeography and numerical taxonomy
literature, and to a lesser extent quantitative plant
ecology. All three subdisciplines of biology have
decades of experience, and copious literature, looking
at similarity measures, and in my experience much more
practical or pragmatic than the 'pure' biostatistics
literature, and infinitely more practical than any
theoretical statistical or mathematical literature I
have seen (trust me, I have a bookcase full of this
"stuff").

A good linear algebra library would be useful, but
there are a lot of nonlinear analyses that would be of
interest; and there are nonparametric, yet
quantitative approaches that are of considerable
interest in assessing similarity.

I don't know of work looking at applying things like
discriminant functions analysis or cluster analysis or
any of the many ordination analyses that may be
considered to searches in a database, but then I
haven't looked at the question since I graduated. I
am interested in the question, though, and would be
interested in hearing about your experience on the
question.

If I can manage the time, I hope to start a project
where I can store description data for specimens of
plants and animals, use analyses including but not
limited to ordination, clustering, discriminant
functions, cannonical correlation, to create a
structure for comparing them, and for identifying new
specimens, or at a minimum, if the specimen is truly
something unknown, learn what known specimens or
groups thereof it is most similar to, and how it is
different.

I have managed to install pl/r, but I haven't had the
time to figure out how best to analyze data stored in
the database using it. In the data I Do have, it
changes daily, and some of the tables are well over
100MB, so I am a bit worried about how well it can
handle such an amount of data, and how long it would
take.

Cheers,

Ted

#13Webb Sprague
webb.sprague@gmail.com
In reply to: Enrico Sirola (#1)
Re: arrays of floating point numbers / linear algebra operations into the DB

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:

Hello,
I'd like to perform linear algebra operations on float4/8 arrays

Having avoided a bunch of real work wondering about linear algebra and
PG, did you consider the Gnu Scientific Library ? We would still need
to hook everything together, but it seems to do a lot of this, and is
written in C, etc.

#14Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Ted Byers (#12)
Re: arrays of floating point numbers / linear algebra operations into the DB

Ted Byers wrote:

--- Webb Sprague <webb.sprague@gmail.com> wrote:

...linear algebra ...

... matrices and vectors .

...Especially if some GIST or similar index

could efficiently search

for vectors "close" to other vectors...

I see a potential problem here, in terms of how one
defines "close" or similitude. I think, though,
practical answers can be found in examples of applying
quantitative methods in some subdisciplines of
biology.

Even if the best GIST can give is selecting vectors
constrained by a n-dimensional bounding box (in much
the same way it does for postgis) it can help a lot.

Then your application can select everything in a
conservatively large box, and use whatever it's
favorite metric is to narrow the data further.

Someone pointed to the potential utility of pl/R. I
would be interested at least in learning about your
assessment of the two (postgis and pl/r.

I think they'd be complimentary.

IMHO if a native postgresql datatype could allow indexes
to narrow the amount of data that needs to be processed;
it'd be great to do the rest of the work using R (though
we're perhaps foolishly using something else in our
application).

If you're looking at similarity, and some practicality
in the USE of quantitative procedures, you may want to
look into the biogeography and numerical taxonomy
literature, and to a lesser extent quantitative plant
ecology.

Indeed. Though the current literature I wade through
is crime analysis. Ideally our software tries to
match witness descriptions of "black SUV" as substantially
similar to "dark green 4runner" - especially when seen
at night; and understand that a "Glock 19" is a
"9mm handgun" and that a "9mm handgun" might be but
isn't necessarily a "Glock 19". Also - two person
records with a tattoo of a cross might contribute a little
similarity -- but two records with tattoos of Darth
Maul / Maori face art contribute much more to the
similarity scores because they're so much more distinctive.

And of course there are entire companies focused on similarity
metrics for fingerprints, DNA, and names (Bill is arguably more
similar to William than to Bell).

Any magic indexes to help such queries would be
very cool; but so far we do most of it in our
application logic.

A good linear algebra library would be useful, but
there are a lot of nonlinear analyses that would be of
interest; and there are nonparametric, yet
quantitative approaches that are of considerable
interest in assessing similarity.

True, many things don't map cleanly to linear algebra;
but they would be quite useful.

If I can manage the time, I hope to start a project
where I can store description data for specimens of
plants and animals, use analyses including but not
limited to ordination, clustering, discriminant
functions, cannonical correlation, to create a
structure for comparing them, and for identifying new
specimens, or at a minimum, if the specimen is truly
something unknown, learn what known specimens or
groups thereof it is most similar to, and how it is
different.

Very cool. Sounds like a somewhat similar issue to mine.

#15Enrico Sirola
enrico.sirola@gmail.com
In reply to: Webb Sprague (#13)
Re: arrays of floating point numbers / linear algebra operations into the DB

Hi Webb, Joe, Martijn

Webb Sprague ha scritto:

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:

Hello,
I'd like to perform linear algebra operations on float4/8 arrays

Having avoided a bunch of real work wondering about linear algebra and
PG, did you consider the Gnu Scientific Library ? We would still need
to hook everything together, but it seems to do a lot of this, and is
written in C, etc.

I experimented a bit today with cblas, and wrapped the blas function for
scaling a vector. The following session shows the usage:

create or replace function scale(float8, float8[])
returns float8[]
as '$libdir/linalg', 'scale'
language 'C' immutable strict;

sps_dev=# select scale(k, '{1,2,3}') from generate_series(1,10) k;
scale
------------
{1,2,3}
{2,4,6}
{3,6,9}
{4,8,12}
{5,10,15}
{6,12,18}
{7,14,21}
{8,16,24}
{9,18,27}
{10,20,30}
(10 rows)

sps_dev=# create operator * (leftarg=float8, rightarg=float8[],
procedure=scale);

sps_dev=# select k * '{1,2,3}'::float8[] from generate_series(1,10) k;
?column?
------------
{1,2,3}
{2,4,6}
{3,6,9}
{4,8,12}
{5,10,15}
{6,12,18}
{7,14,21}
{8,16,24}
{9,18,27}
{10,20,30}
(10 rows)

I'm quite proud, this is my first C extension function ;-)
I'd gladly post the code if it's ok for the list users. It's more or
less 100 lines of code. This approach seems promising...

By the way, Webb: I took a look at GSL and it seems to me that, from a
linear algebra point of view, it's basically cblas, so I'd use cblas
directly.
Please let me know your thoughts/advices,
e.

#16Webb Sprague
webb.sprague@gmail.com
In reply to: Enrico Sirola (#15)
Re: arrays of floating point numbers / linear algebra operations into the DB

I'm quite proud, this is my first C extension function ;-)
I'd gladly post the code if it's ok for the list users. It's more or
less 100 lines of code. This approach seems promising...

I would definitely like to see it.

By the way, Webb: I took a look at GSL and it seems to me that, from a
linear algebra point of view, it's basically cblas, so I'd use cblas
directly.
Please let me know your thoughts/advices,

The only thing about GSL is that it would make it easier to tie into
some very sophisticated stuff later, and (I think) the basic linear
algebra is probably just as fast as CBLAS, and we could implement it
first. It would also be easy to define a big project as : "bring GSL
to Postgres", and then people could work on pieces. But if you
actually write it, you get to decide :)

GSL licensing is GNU ish, so may be that is a deal breaker, too.

w

Show quoted text

e.

#17Enrico Sirola
enrico.sirola@gmail.com
In reply to: Webb Sprague (#16)
Re: arrays of floating point numbers / linear algebra operations into the DB

Hi Webb,

Webb Sprague ha scritto:

I'm quite proud, this is my first C extension function ;-)
I'd gladly post the code if it's ok for the list users. It's more or
less 100 lines of code. This approach seems promising...

I would definitely like to see it.

here it goes:

-------------------------->linalg.h<----------------------------------

#ifndef linalg_h
#define linalg_h

#include "postgres.h"
#include "utils/array.h"

Datum scale(PG_FUNCTION_ARGS);

#endif /* linalg_h */

-------------------------->linalg.c<----------------------------------

#include "linalg.h"
#include "fmgr.h"
#include <cblas.h>

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1( scale);

Datum
scale(PG_FUNCTION_ARGS)
{
float8 x = PG_GETARG_FLOAT8(0);
ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(1);
int *dims1,
*lbs1,
ndims1,
nitems1,
ndatabytes1;
int *arrlbound1, *arrlbound2;
char *arrdatap1, *arrdatap2;

ArrayType *rv;

/* get argument array details */
lbs1 = ARR_LBOUND(v1);
dims1 = ARR_DIMS(v1);
ndims1 = v1->ndim;
nitems1 = ArrayGetNItems(ndims1, dims1);
ndatabytes1 = ARR_SIZE(v1) - ARR_DATA_OFFSET(v1);

if ( ndims1 != 1 )
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("Multi dimensional array given"),
errdetail("Array have %d dimensions", ndims1)));

if (ARR_HASNULL(v1))
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("Null in array"),
errdetail("array should not contain null elements")));

/* allcate new vector */

rv = (ArrayType *) palloc(ndatabytes1);

SET_VARSIZE(rv, ndatabytes1);
rv->ndim = v1->ndim;
rv->dataoffset = v1->dataoffset; // no nulls (0)
rv->elemtype = v1->elemtype;
memcpy(ARR_DIMS(rv), ARR_DIMS(v1), sizeof(int));

arrlbound2 = ARR_LBOUND(rv);
arrlbound1 = ARR_LBOUND(v1);

memcpy(arrlbound2, arrlbound1, sizeof(int));

arrdatap1 = ARR_DATA_PTR(v1);
arrdatap2 = ARR_DATA_PTR(rv);

memcpy(arrdatap2, arrdatap1, nitems1 * sizeof(float8));

/* scale vector a la blas */
cblas_dscal(nitems1, x, (float8*) arrdatap2, 1);

PG_RETURN_ARRAYTYPE_P(rv);
}

--------------------------->linalg.sql<---------------------------------

/* -*-sql-*- */
create or replace function scale(float8, float8[])
returns float8[]
as '$libdir/linalg', 'scale'
language 'C' immutable strict;

create aggregate array_accum (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);

create operator * (leftarg=float8, rightarg=float8[], procedure=scale);

-------------------------------->end<------------------------------------

GSL licensing is GNU ish, so may be that is a deal breaker, too.

well, I don't know. This is just a proof of concept. Anyway, yes, there
could be problems with GPL.

On the above code: coupling cblas functions with PG float8[] seems easy,
you just have to know which black-magic-macros to use in order to access
the data structure. It took me a while to figure out how it works (I'm
not actually sure I understood everything, but at least the above code
seems to work). The problem for the code above is that it doesn't work
for vectors longer than 1000 elements or so (try it with 2000 and it
doesn't work). I guess I should manage the "toasting" machinery in some
ways - any suggestion is appreciated
Bye,
e.

#18Enrico Sirola
enrico.sirola@gmail.com
In reply to: Enrico Sirola (#17)
Re: arrays of floating point numbers / linear algebra operations into the DB

I respond myself:

Enrico Sirola ha scritto:

[...]

seems to work). The problem for the code above is that it doesn't work
for vectors longer than 1000 elements or so (try it with 2000 and it
doesn't work). I guess I should manage the "toasting" machinery in some
ways - any suggestion is appreciated

wrong. it was just that I forgot to add ARR_OVERHEAD_NONULLS(ndims1) to
the mem allocation for rv:

rv = (ArrayType *) palloc(nbytes);

becomes

rv = (ArrayType *) palloc(nbytes) + ARR_OVERHEAD_NONULLS(ndims1);

and now it seems to work :)