Best way to construct PostgreSQL ArrayType (_int4) from C int array

Started by Adrian Schreyeralmost 15 years ago6 messagesgeneral
Jump to latest
#1Adrian Schreyer
ams214@cam.ac.uk

At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:

const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);

for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);

ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');

Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?

Cheers,

Adrian

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Adrian Schreyer (#1)
Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array

On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:

const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);

for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);

ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');

Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?

That's pretty much the best way AFAIK. Int32GetDatum doesn't do
anything fancy -- it's just a 32 bit mask/assignment. constructing
the array at once is going to be a lot better than incrementally
creating it. Do you expect the arrays to be large, say bigger than
10k elements?

merlin

#3Adrian Schreyer
ams214@cam.ac.uk
In reply to: Merlin Moncure (#2)
Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array

The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.

Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):

128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332ms

That's why I thought that the construction of the PostgreSQL array was
not optimal.

Show quoted text

On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:

const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);

for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);

ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');

Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?

That's pretty much the best way AFAIK. Int32GetDatum doesn't do
anything fancy -- it's just a 32 bit mask/assignment.  constructing
the array at once is going to be a lot better than incrementally
creating it.  Do you expect the arrays to be large, say bigger than
10k elements?

merlin

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Adrian Schreyer (#3)
Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array

On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.

Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):

128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332ms

hm, I'm not following you exactly -- what sql are you running? This
scales pretty well for me:
select array_dims(array(select generate_series(1,1000000)));
etc

merlin

#5Adrian Schreyer
ams214@cam.ac.uk
In reply to: Merlin Moncure (#4)
Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array

On Wed, Apr 27, 2011 at 18:06, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.

Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):

128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332ms

hm, I'm not following you exactly -- what sql are you running?  This
scales pretty well for me:
select array_dims(array(select generate_series(1,1000000)));
etc

merlin

I have a C extension function that creates _int4 arrays of a specified
size with random elements, in this case 128,256,512 etc. Another
function from my extension returns the sum of the array. In this case
I created a table with around 150k arrays to benchmark the extension.
The query sums each array in the table and returns the 10 highest
numbers. The C extension is actually a wrapper around the Eigen 3
template library, which works pretty well - now I am trying to tweak
the input/output functions to get better performance with larger
arrays.

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Adrian Schreyer (#3)
Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array

2011/4/27 Adrian Schreyer <ams214@cam.ac.uk>:

The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.

Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):

PostgreSQL doesn't use a index for access to array fields. So access
to fields of packed arrays can be slower for higher subscripts.

Regards

Pavel Stehule

Show quoted text

128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332ms

That's why I thought that the construction of the PostgreSQL array was
not optimal.

On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote:

At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:

const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);

for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);

ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');

Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?

That's pretty much the best way AFAIK. Int32GetDatum doesn't do
anything fancy -- it's just a 32 bit mask/assignment.  constructing
the array at once is going to be a lot better than incrementally
creating it.  Do you expect the arrays to be large, say bigger than
10k elements?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general