Best way to construct PostgreSQL ArrayType (_int4) from C int array
At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:
const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);
for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);
ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');
Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?
Cheers,
Adrian
On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);
ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');
Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?
That's pretty much the best way AFAIK. Int32GetDatum doesn't do
anything fancy -- it's just a 32 bit mask/assignment. constructing
the array at once is going to be a lot better than incrementally
creating it. Do you expect the arrays to be large, say bigger than
10k elements?
merlin
The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.
Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):
128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332ms
That's why I thought that the construction of the PostgreSQL array was
not optimal.
Show quoted text
On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);
ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');
Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?That's pretty much the best way AFAIK. Int32GetDatum doesn't do
anything fancy -- it's just a 32 bit mask/assignment. constructing
the array at once is going to be a lot better than incrementally
creating it. Do you expect the arrays to be large, say bigger than
10k elements?merlin
On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332ms
hm, I'm not following you exactly -- what sql are you running? This
scales pretty well for me:
select array_dims(array(select generate_series(1,1000000)));
etc
merlin
On Wed, Apr 27, 2011 at 18:06, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332mshm, I'm not following you exactly -- what sql are you running? This
scales pretty well for me:
select array_dims(array(select generate_series(1,1000000)));
etcmerlin
I have a C extension function that creates _int4 arrays of a specified
size with random elements, in this case 128,256,512 etc. Another
function from my extension returns the sum of the array. In this case
I created a table with around 150k arrays to benchmark the extension.
The query sums each array in the table and returns the 10 highest
numbers. The C extension is actually a wrapper around the Eigen 3
template library, which works pretty well - now I am trying to tweak
the input/output functions to get better performance with larger
arrays.
2011/4/27 Adrian Schreyer <ams214@cam.ac.uk>:
The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):
PostgreSQL doesn't use a index for access to array fields. So access
to fields of packed arrays can be slower for higher subscripts.
Regards
Pavel Stehule
Show quoted text
128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332msThat's why I thought that the construction of the PostgreSQL array was
not optimal.On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:
At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);
ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');
Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?That's pretty much the best way AFAIK. Int32GetDatum doesn't do
anything fancy -- it's just a 32 bit mask/assignment. constructing
the array at once is going to be a lot better than incrementally
creating it. Do you expect the arrays to be large, say bigger than
10k elements?merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general