efficient math vector operations on arrays

Started by Marcus Engeneover 10 years ago9 messagesgeneral
Jump to latest
#1Marcus Engene
mengpg2@engene.se

Hi,

Are there highly efficient C extensions out there for math operations on
arrays? Dot product and whatnot.

Example usecase: sort an item by euclid distance.

Kind regards,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Marcus Engene (#1)
Re: efficient math vector operations on arrays

Hi

2015-12-24 8:05 GMT+01:00 Marcus Engene <mengpg2@engene.se>:

Hi,

Are there highly efficient C extensions out there for math operations on
arrays? Dot product and whatnot.

what you mean "highly efficient" ?

PostgreSQL executor is interpret - so in almost all cases the special
optimizations has not big sense. If you take few us, you will lost in
executor.

Example usecase: sort an item by euclid distance.

some is in intarray http://www.postgresql.org/docs/9.4/static/intarray.html

Regards

Pavel

Show quoted text

Kind regards,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Marcus Engene
mengpg2@engene.se
In reply to: Pavel Stehule (#2)
Re: efficient math vector operations on arrays

On 24/12/15 07:13, Pavel Stehule wrote:

Hi

2015-12-24 8:05 GMT+01:00 Marcus Engene <mengpg2@engene.se
<mailto:mengpg2@engene.se>>:

Hi,

Are there highly efficient C extensions out there for math
operations on arrays? Dot product and whatnot.

what you mean "highly efficient" ?

Implemented as a C module so I wont have to unnest or plpgsql.

Kind regards,
Marcus

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Marcus Engene (#3)
Re: efficient math vector operations on arrays

2015-12-24 8:34 GMT+01:00 Marcus Engene <mengpg2@engene.se>:

On 24/12/15 07:13, Pavel Stehule wrote:

Hi

2015-12-24 8:05 GMT+01:00 Marcus Engene <mengpg2@engene.se>:

Hi,

Are there highly efficient C extensions out there for math operations on
arrays? Dot product and whatnot.

what you mean "highly efficient" ?

Implemented as a C module so I wont have to unnest or plpgsql.

ok,

I don't know any extension that calculate euclid distance, but it should be
trivial in C - if you don't need to use generic types and generic
operations.

Pavel

Show quoted text

Kind regards,
Marcus

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pavel Stehule (#4)
Re: efficient math vector operations on arrays

On 12/24/15 1:56 AM, Pavel Stehule wrote:

I don't know any extension that calculate euclid distance, but it should
be trivial in C - if you don't need to use generic types and generic
operations.

Before messing around with that, I'd recommend trying either pl/r or
pl/pythonu.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Jony Cohen
jony.cohenjo@gmail.com
In reply to: Jim Nasby (#5)
Re: efficient math vector operations on arrays

Hi, Don't know if it's exactly what you're looking for but the MADLib
package has utility function for matrix and vector operations.
see: http://doc.madlib.net/latest/group__grp__array.html

Regards,
- Jony

On Fri, Dec 25, 2015 at 9:58 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Show quoted text

On 12/24/15 1:56 AM, Pavel Stehule wrote:

I don't know any extension that calculate euclid distance, but it should
be trivial in C - if you don't need to use generic types and generic
operations.

Before messing around with that, I'd recommend trying either pl/r or
pl/pythonu.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jony Cohen (#6)
Re: efficient math vector operations on arrays

On 12/27/15 2:00 AM, Jony Cohen wrote:

Hi, Don't know if it's exactly what you're looking for but the MADLib
package has utility function for matrix and vector operations.
see: http://doc.madlib.net/latest/group__grp__array.html

Apply an operator to al elements on an array or pair of arrays:
http://theplateisbad.blogspot.com/2015/12/the-arraymath-extension-vs-plpgsql.html,
https://github.com/pramsey/pgsql-arraymath.

See also
http://theplateisbad.blogspot.com/2015/12/more-fortran-90-like-vector-operations.html.

BTW, if you want to simply apply a function to all elements in an array
there is an internal C function array_map that can do it. There's no SQL
interface to it, but it shouldn't be hard to add one.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#7)
Re: efficient math vector operations on arrays

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

BTW, if you want to simply apply a function to all elements in an array
there is an internal C function array_map that can do it. There's no SQL
interface to it, but it shouldn't be hard to add one.

That wouldn't be useful for the example given originally, since it
iterates over just one array not two arrays in parallel. But you could
imagine writing something similar that would iterate over two arrays and
call a two-argument function.

Whether it's worth a SQL interface is debatable though. Whatever
efficiency you might gain from using this would probably be eaten by the
overhead of calling a SQL or PL function for each pair of array elements.
You'd probably end up in the same ballpark performance-wise as the UNNEST
solution given earlier.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#8)
Re: efficient math vector operations on arrays

On 12/29/15 6:50 PM, Tom Lane wrote:

Jim Nasby<Jim.Nasby@BlueTreble.com> writes:

BTW, if you want to simply apply a function to all elements in an array
there is an internal C function array_map that can do it. There's no SQL
interface to it, but it shouldn't be hard to add one.

That wouldn't be useful for the example given originally, since it
iterates over just one array not two arrays in parallel. But you could
imagine writing something similar that would iterate over two arrays and
call a two-argument function.

Actually, I suspect you could pretty easily do array_map(regprocedure,
VARIADIC anyarray).

Whether it's worth a SQL interface is debatable though. Whatever
efficiency you might gain from using this would probably be eaten by the
overhead of calling a SQL or PL function for each pair of array elements.
You'd probably end up in the same ballpark performance-wise as the UNNEST
solution given earlier.

Take a look at [1]http://theplateisbad.blogspot.com/2015/12/the-arraymath-extension-vs-plpgsql.html; using a rough equivalent to array_map is 6% faster
than unnest().

The array op array version is 30% faster that plpgsql, which based on
the code at [2]http://theplateisbad.blogspot.com/2015/12/more-fortran-90-like-vector-operations.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com I assume is doing

explain analyze select array(select a*b from unnest(array(select
random() from generate_series(1,1000000)), array(select random() from
generate_series(1,1000000)))) u(a,b);

The syntactic sugar of r := array_map('function(a, b)', in1, in2) (let
alone r := in1 * in2;) is appealing too.

[1]: http://theplateisbad.blogspot.com/2015/12/the-arraymath-extension-vs-plpgsql.html
http://theplateisbad.blogspot.com/2015/12/the-arraymath-extension-vs-plpgsql.html
[2]: http://theplateisbad.blogspot.com/2015/12/more-fortran-90-like-vector-operations.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
http://theplateisbad.blogspot.com/2015/12/more-fortran-90-like-vector-operations.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general