array_random

Started by jian he10 months ago9 messageshackers
Jump to latest
#1jian he
jian.universality@gmail.com

hi.

context: [1]/messages/by-id/CACJufxGRCP19Rm66=TSBwmEuVr92FwL_e6YFjmCpJrgu6Km9hQ@mail.gmail.com.
the attached patch already posted in [1]/messages/by-id/CACJufxGRCP19Rm66=TSBwmEuVr92FwL_e6YFjmCpJrgu6Km9hQ@mail.gmail.com.
I don't want to hijack another thread. so I post it in a separate thread.

The attached patch introduces a new function: array_random.
array_random description:
Returns an array filled with random values in the range min <= x <= max,
having dimensions of the lengths specified by dims. The optional lbounds
argument supplies lower-bound values for each dimension (which default
to all 1).

array_random function signature:
array_random(min int4, max int4, dims int[] [, lbounds int[]]) -> int[]
array_random(min int8, max int8, dims int[] [, lbounds int[]]) -> int8[]
array_random(min numeric, max numeric, dims int[] [, lbounds int[]])
-> numeric[]

demo:
SELECT array_random(1, 6, array[2,5], array[2,4]);
array_random
--------------------------------------
[2:3][4:8]={{6,2,2,5,4},{4,5,6,4,6}}

reasons for adding array_random is:
1. This is better than array_fill. This can fill random and constant
values (random, min and max the same).
2. Building a multi-dimensional PL/pgSQL function equivalent to
array_random is not efficient and is also not easier.

[1]: /messages/by-id/CACJufxGRCP19Rm66=TSBwmEuVr92FwL_e6YFjmCpJrgu6Km9hQ@mail.gmail.com

Attachments:

v1-0001-array_random.patchtext/x-patch; charset=US-ASCII; name=v1-0001-array_random.patchDownload+439-1
#2jian he
jian.universality@gmail.com
In reply to: jian he (#1)
Re: array_random

On Mon, Jun 30, 2025 at 11:04 PM jian he <jian.universality@gmail.com> wrote:

demo:
SELECT array_random(1, 6, array[2,5], array[2,4]);
array_random
--------------------------------------
[2:3][4:8]={{6,2,2,5,4},{4,5,6,4,6}}

reasons for adding array_random is:
1. This is better than array_fill. This can fill random and constant
values (random, min and max the same).
2. Building a multi-dimensional PL/pgSQL function equivalent to
array_random is not efficient and is also not easier.

[1] /messages/by-id/CACJufxGRCP19Rm66=TSBwmEuVr92FwL_e6YFjmCpJrgu6Km9hQ@mail.gmail.com

i didn't use ./src/include/catalog/duplicate_oids
then there is some duplicated oid issue.
the attached patch fixes this issue.

Attachments:

v2-0001-array_random.patchtext/x-patch; charset=US-ASCII; name=v2-0001-array_random.patchDownload+437-1
#3Vik Fearing
vik@postgresfriends.org
In reply to: jian he (#1)
Re: array_random

On 30/06/2025 17:04, jian he wrote:

reasons for adding array_random is:
1. This is better than array_fill. This can fill random and constant
values (random, min and max the same).
2. Building a multi-dimensional PL/pgSQL function equivalent to
array_random is not efficient and is also not easier.

I am not against this at all, but what is the actual use case?

--

Vik Fearing

#4Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Vik Fearing (#3)
Re: array_random

On Sat, 5 Jul 2025 at 08:32, Vik Fearing <vik@postgresfriends.org> wrote:

On 30/06/2025 17:04, jian he wrote:

reasons for adding array_random is:
1. This is better than array_fill. This can fill random and constant
values (random, min and max the same).
2. Building a multi-dimensional PL/pgSQL function equivalent to
array_random is not efficient and is also not easier.

I took a quick look at this and there are a number of things that I
don't like about the implementation. However, ...

I am not against this at all, but what is the actual use case?

I think that's a reasonable question to ask.

The original use case [1]/messages/by-id/87plssezpc.fsf@163.com was to produce arrays of random numbers for
testing gin indexes.

[1]: /messages/by-id/87plssezpc.fsf@163.com

However, this is adding quite a lot of C code for something that's
actually pretty easy to do as a user-defined function in SQL. Maybe
that's OK, if it's something that there's a lot of demand for, but
it's worth asking that question.

Regards,
Dean

#5jian he
jian.universality@gmail.com
In reply to: Vik Fearing (#3)
Re: array_random

On Sat, Jul 5, 2025 at 3:32 PM Vik Fearing <vik@postgresfriends.org> wrote:

On 30/06/2025 17:04, jian he wrote:

reasons for adding array_random is:
1. This is better than array_fill. This can fill random and constant
values (random, min and max the same).
2. Building a multi-dimensional PL/pgSQL function equivalent to
array_random is not efficient and is also not easier.

I am not against this at all, but what is the actual use case?

--

it seems not trivial to wrap up all the generated random values into a specific
multi-dimensional array (more than 2 dimensions).
for example, say we generated 24 random values and wanted to arrange them into a
3-dimensional array with shape [4, 3, 2].
we can easily use:
SELECT array_random(1, 6, array[4,3, 2]);

of course, we can use plpgsql to do it, but the c function would be
more convenient.
does this make sense?

#6Aleksander Alekseev
aleksander@timescale.com
In reply to: jian he (#5)
Re: array_random

Hi,

it seems not trivial to wrap up all the generated random values into a specific
multi-dimensional array (more than 2 dimensions).
for example, say we generated 24 random values and wanted to arrange them into a
3-dimensional array with shape [4, 3, 2].
we can easily use:
SELECT array_random(1, 6, array[4,3, 2]);

of course, we can use plpgsql to do it, but the c function would be
more convenient.
does this make sense?

The proposed function seems to do two things at a time - generating
random values and transforming them into an array of desired
dimensions. Generally we try to avoid such interfaces. Can you think
of something like array_transform() / array_reshape() that takes an
arbitrary single-dimension array and modifies it?

#7Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Aleksander Alekseev (#6)
Re: array_random

On Tue, 8 Jul 2025 at 15:26, Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

The proposed function seems to do two things at a time - generating
random values and transforming them into an array of desired
dimensions. Generally we try to avoid such interfaces. Can you think
of something like array_transform() / array_reshape() that takes an
arbitrary single-dimension array and modifies it?

That's a good point. Arguably, creating a simple 1-D array of random
values is trivial enough to leave to users, and there isn't sufficient
demand to justify creating core functions for it.

OTOH, creating multi-dimensional arrays (of any values, not just
random) is harder. Perhaps there is an argument for in-core functions
to make that easier, though I'm not sure how many people actually use
multi-dimensional arrays. In any case, that seems like a separate
discussion for a new thread, if there's demand.

Regards,
Dean

#8jian he
jian.universality@gmail.com
In reply to: Dean Rasheed (#7)
Re: array_random

On Fri, Jul 11, 2025 at 3:49 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Tue, 8 Jul 2025 at 15:26, Aleksander Alekseev
<aleksander@tigerdata.com> wrote:

The proposed function seems to do two things at a time - generating
random values and transforming them into an array of desired
dimensions. Generally we try to avoid such interfaces. Can you think
of something like array_transform() / array_reshape() that takes an
arbitrary single-dimension array and modifies it?

That's a good point. Arguably, creating a simple 1-D array of random
values is trivial enough to leave to users, and there isn't sufficient
demand to justify creating core functions for it.

use sql, we generally do something like:
select array_agg(random(1, 10)) from generate_series(1, 2) g;
but its performance is worse than array_random.

Does performance and other factors justify adding array_random?

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: jian he (#8)
Re: array_random

jian he <jian.universality@gmail.com> writes:

Does performance and other factors justify adding array_random?

There hasn't really been field demand for such a function, AFAIR.
There's an infinite amount of stuff that would be faster if coded
in C than if written in SQL or plpgsql. We can't support it all,
so we need some fairly high bar to accepting what functionality
we'll support.

In the case at hand, "fill a 1-D array with random values" is
quite trivial to do in either SQL or plpgsql. However, "reshape
an array to have such-and-such dimensions" isn't. So in the abstract
I'd be more sympathetic to inventing a function that can do reshaping.
But there is still the question of whether anybody really needs that
functionality.

regards, tom lane