[PATCH] Generate random dates/times in a specified range

Started by Damien Clochard8 months ago26 messages
Jump to latest
#1Damien Clochard
damien@dalibo.info

Hello,

As I am involved in the PostgreSQL Anonymizer extension, I found that
the random(min,max) functions introduced recently are very useful to
generate synthetic data or define a masking policy.

I decided to submit a similar set of functions for random dates and
times.

So this adds 5 new variants of the random() function:

random(min date, max date) returns date
random(min time, max time) returns time
random(min time, max time, zone text) returns timetz
random(min timestamp, max timestamp) returns timestamp
random(min timestamptz, max timestamptz) returns timestamptz

Each one returns a random date/time value t in the range min <= t <=
max.

For the timetz function, a third parameter is required to define the
timezone.
However if the value is an empty string, the session timezone is used.

These functions all rely on the pg_prng_int64_range function developped
in
PG 17 for the random(bigint,bigint) function.

Regards,

--
Damien Clochard

Attachments:

v1-0001-Generate-random-dates-times-in-a-specified-range.patchtext/x-diff; name=v1-0001-Generate-random-dates-times-in-a-specified-range.patchDownload+412-9
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Damien Clochard (#1)
Re: [PATCH] Generate random dates/times in a specified range

Damien Clochard <damien@dalibo.info> writes:

So this adds 5 new variants of the random() function:

random(min date, max date) returns date
random(min time, max time) returns time
random(min time, max time, zone text) returns timetz
random(min timestamp, max timestamp) returns timestamp
random(min timestamptz, max timestamptz) returns timestamptz

I'm a little uncomfortable with this proposal, mainly because it
overloads the random() function name to the point where I'm afraid
of "ambiguous function" failures in SQL code that used to be fine.

The traditional way of achieving these results would be something like

select now() + random() * interval '10 days';

and I'm not convinced that the use-case is so large as to justify
adding built-in forms of that.

regards, tom lane

#3Damien Clochard
damien@dalibo.info
In reply to: Tom Lane (#2)
Re: [PATCH] Generate random dates/times in a specified range

Le 10.07.2025 00:14, Tom Lane a écrit :

Damien Clochard <damien@dalibo.info> writes:

So this adds 5 new variants of the random() function:

random(min date, max date) returns date
random(min time, max time) returns time
random(min time, max time, zone text) returns timetz
random(min timestamp, max timestamp) returns timestamp
random(min timestamptz, max timestamptz) returns timestamptz

I'm a little uncomfortable with this proposal, mainly because it
overloads the random() function name to the point where I'm afraid
of "ambiguous function" failures in SQL code that used to be fine.

Hi

Thanks for the feedback !

I agree with this, I overloaded the random() function because this is
what was done previously with `random(int,int)` and I did the same like
the good sheep that I am :)
but i'm fine with renaming this functions to daterandom, timerandom or
whatever....

The traditional way of achieving these results would be something like

select now() + random() * interval '10 days';

and I'm not convinced that the use-case is so large as to justify
adding built-in forms of that.

From my experience, when users are writing a set of masking rules, they
tend to anonymize the dates with "a random date between start_date and
end_date"

Which can be trasnlated like this

SELECT start_date+(random()*(end_date-start_date))::interval;

But when you have hundreds of masking rules, the meaning of this one is
not so clear.

Now with PostgreSQL 18, we can write

SELECT random(start_date::int, end_date::int)::date;

Which is more explicit, but we could extend that logic to:

SELECT daterandom(start_date,end_date);

I agree this is merely syntactic sugar for the developers, but I don't
see why it is ok to provide random(int,int) or random(numeric,numeric)
and why random(date,date) is not.

Regards,

--
Damien Clochard

#4Greg Sabino Mullane
greg@turnstep.com
In reply to: Damien Clochard (#3)
Re: [PATCH] Generate random dates/times in a specified range

Patch review

OVERALL:

I like the idea, especially the date variant. Unlike Tom, I'm not
particularly concerned about breakage of existing scripts, as
most already are working just fine with raw numbers and I don't see this
patch breaking them.

In a selfish vein, I would use the "date" and timestamp variants a lot. I
would use the "time" ones seldom to never.

HOUSEKEEPING:

* Needs a pgindent run

* Commit message typo: developped

DOCS:

* Suggest adding an example of a random(date, date) call at the end of the
<row> section

* Typo: speficy

* Suggest: "current session's timezone"

* Suggest: remove "instead"

CODE:

* Code comments should be consistent with "Returns" not "Generate"

* result = (DateADT) (int32) pg_prng_int64_range(&prng_state, rmin, rmax);

Is the int32 cast needed here? My compiler seems fine without it

* This behaviour is based on generate_series_timestamptz_internal

Is it? I'm not seeing much overlap between the two

The addition of a timezone arg does complicate things. Is it really needed?

If not (just relying on session), timetzrandom becomes much much simpler
(e.g. time_timetz)

RUNNING:

* Found a bug:

greg=# select random('12:00'::time, '13:00'::time);
random
-----------------
12:41:35.612385

greg=# select random('12:00'::time, '13:00'::time, ''::text);
random
--------------------
12:20:58.308452-04

greg=# select random('12:00'::time, '13:00'::time, 'FOO'::text);
ERROR: time zone "FOO" not recognized

greg=# select random('12:00'::time, '13:00'::time, 'IST-9'::text);
random
--------------------
12:42:46.384039+09

greg=# select random('12:00'::time, '13:00'::time, 'IST'::text);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

At the very least, add a simple TLA timezone to the sql tests once fixed.

* Infinity

Fun, but is it useful? Perhaps disallow infinity. No strong opinion on this.

greg=# select random(now(), 'infinity'::date);
random
--------------------------------
86609-03-21 15:23:43.291271-04

greg=# select random('-infinity'::date, 'infinity'::date);
random
---------------
6126327-09-13

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#5Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Greg Sabino Mullane (#4)
Re: [PATCH] Generate random dates/times in a specified range

On Sat, 12 Jul 2025 at 16:15, Greg Sabino Mullane <htamfids@gmail.com> wrote:

I like the idea, especially the date variant. Unlike Tom, I'm not particularly concerned about breakage of existing scripts, as
most already are working just fine with raw numbers and I don't see this patch breaking them.

In a selfish vein, I would use the "date" and timestamp variants a lot. I would use the "time" ones seldom to never.

But it's completely trivial to emulate random(min_date, max_date), just by doing

min_date + random(0, max_date - min_date)

Is it really worth adding a core function for that?

Regards,
Dean

#6Robert Treat
xzilla@users.sourceforge.net
In reply to: Dean Rasheed (#5)
Re: [PATCH] Generate random dates/times in a specified range

On Mon, Jul 14, 2025 at 3:21 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Sat, 12 Jul 2025 at 16:15, Greg Sabino Mullane <htamfids@gmail.com> wrote:

I like the idea, especially the date variant. Unlike Tom, I'm not particularly concerned about breakage of existing scripts, as
most already are working just fine with raw numbers and I don't see this patch breaking them.

In a selfish vein, I would use the "date" and timestamp variants a lot. I would use the "time" ones seldom to never.

But it's completely trivial to emulate random(min_date, max_date), just by doing

min_date + random(0, max_date - min_date)

Is it really worth adding a core function for that?

I feel like this is a very similar argument against what was
ultimately the addition of timestamp based generate_series functions,
and similarly I think adding these in would be a rather useful
improvement for users, though like generate_series, we don't need to
hit every different data type (no one should ever generate a random
timetz for instance).

Robert Treat
https://xzilla.net

#7Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Robert Treat (#6)
Re: [PATCH] Generate random dates/times in a specified range

On Tue, 15 Jul 2025 at 04:49, Robert Treat <rob@xzilla.net> wrote:

On Mon, Jul 14, 2025 at 3:21 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

But it's completely trivial to emulate random(min_date, max_date), just by doing

min_date + random(0, max_date - min_date)

Is it really worth adding a core function for that?

I feel like this is a very similar argument against what was
ultimately the addition of timestamp based generate_series functions,
and similarly I think adding these in would be a rather useful
improvement for users, though like generate_series, we don't need to
hit every different data type (no one should ever generate a random
timetz for instance).

Right, and for generate_series() we didn't add a variant for type
date. In fact, we explicitly documented how a sequence of dates can be
generated using the date-plus-integer operator.

In fact, I think generate_series() sets a good precedent, and I could
get behind a proposal that adds new random() functions with the same
signatures as the timestamp-based generate_series() functions. In
particular, I think that it's quite likely that if I wanted a random
timestamp in some range, I would want some specified precision, like
say 'hour' or 'day', and not a timestamp with some random number of
microseconds.

(In a similar vein, it might be useful to add an optional "step"
parameter to the random integer/numeric functions, but that's really a
separate proposal.)

Regards,
Dean

#8Greg Sabino Mullane
greg@turnstep.com
In reply to: Dean Rasheed (#5)
Re: [PATCH] Generate random dates/times in a specified range

On Mon, Jul 14, 2025 at 3:21 AM Dean Rasheed <dean.a.rasheed@gmail.com>
wrote:

But it's completely trivial to emulate random(min_date, max_date), just by
doing

min_date + random(0, max_date - min_date)

Is it really worth adding a core function for that?

Yes, I think it is. It is also trivial to get a random int from 50 to 100
with

50 + floor(random() * 51)

but random(50,100) is so much nicer.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Sabino Mullane (#8)
Re: [PATCH] Generate random dates/times in a specified range

Greg Sabino Mullane <htamfids@gmail.com> writes:

On Mon, Jul 14, 2025 at 3:21 AM Dean Rasheed <dean.a.rasheed@gmail.com>
wrote:

Is it really worth adding a core function for that?

Yes, I think it is. It is also trivial to get a random int from 50 to 100
with
50 + floor(random() * 51)
but random(50,100) is so much nicer.

I won't object to adding this for date/timestamp/timestamptz, but
I don't really believe that the time/timetz variants are worth the
code space. I think the latter two would also have definitional
issues because of the modulo behavior of those types' underlying
arithmetic.

regards, tom lane

#10Greg Sabino Mullane
greg@turnstep.com
In reply to: Tom Lane (#9)
Re: [PATCH] Generate random dates/times in a specified range

Damien, maybe we can let the time ones go? Tom and I are not big fans of
those, and nobody else has stepped up to defend them.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#11Damien Clochard
damien@dalibo.info
In reply to: Greg Sabino Mullane (#10)
Re: [PATCH] Generate random dates/times in a specified range

Le 21.07.2025 21:06, Greg Sabino Mullane a écrit :

Damien, maybe we can let the time ones go? Tom and I are not big fans
of those, and nobody else has stepped up to defend them.

Sure !

Here's a second version with the following changes

- remove time and timetz variants
- disallow infinity boundaries
- remove unnecessary int32 cast
- add examples in the documentation
- run pgindent
- update tests
- fix typos

Thanks for the feedback and insights

--
Damien Clochard

Attachments:

v2-0001-Generate-random-dates-times-in-a-specified-range.patchtext/x-diff; name=v2-0001-Generate-random-dates-times-in-a-specified-range.patchDownload+243-9
#12Vik Fearing
vik@postgresfriends.org
In reply to: Damien Clochard (#11)
Re: [PATCH] Generate random dates/times in a specified range

On 24/07/2025 17:20, Damien Clochard wrote:

Le 21.07.2025 21:06, Greg Sabino Mullane a écrit :

Damien, maybe we can let the time ones go? Tom and I are not big fans
of those, and nobody else has stepped up to defend them.

Here's a second version with the following changes

- update tests

Just a small cosmetic gripe, the regression test comments are usually
"-- ok" and "--fail", and not "--  Should error".

No comment on the rest of the patch.

--

Vik Fearing

#13Damien Clochard
damien@dalibo.info
In reply to: Vik Fearing (#12)
Re: [PATCH] Generate random dates/times in a specified range

Le 24.07.2025 20:24, Vik Fearing a écrit :

[...]

Just a small cosmetic gripe, the regression test comments are usually
"-- ok" and "--fail", and not "--  Should error".

Thanks Vik!

Here's a third version with the following changes:

* Rebase from master
* Replace 'Should error' by 'fail' in the test file
* move documentation from func.sgml to func/func-datetime.sgml

Note that former random function variants are documented in
func/func-math.sgml but it felt more logical to document the new
variants in func-datetime.sgml. As a result the random variants would be
documented in 2 separate sections of chapter 9.

--
Damien Clochard

#14Greg Sabino Mullane
greg@turnstep.com
In reply to: Damien Clochard (#13)
Re: [PATCH] Generate random dates/times in a specified range

Your v3 did not get attached to the previous email.

#15Damien Clochard
damien@dalibo.info
In reply to: Greg Sabino Mullane (#14)
Re: [PATCH] Generate random dates/times in a specified range

Le 25.08.2025 15:33, Greg Sabino Mullane a écrit :

Your v3 did not get attached to the previous email.

My bad, here it is

--
Damien Clochard

Attachments:

v3-0001-Generate-random-dates-times-in-a-specified-range.patchtext/x-diff; name=v3-0001-Generate-random-dates-times-in-a-specified-range.patchDownload+253-9
#16Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Damien Clochard (#13)
Re: [PATCH] Generate random dates/times in a specified range

On Mon, 25 Aug 2025 at 12:42, Damien Clochard <damien@dalibo.info> wrote:

Note that former random function variants are documented in
func/func-math.sgml but it felt more logical to document the new
variants in func-datetime.sgml. As a result the random variants would be
documented in 2 separate sections of chapter 9.

Note the paragraph below the table listing the current random
functions, explaining the PRNG and the interaction with setseed(). If
you document the new functions in a separate section, that text would
need updating to refer to 2 separate locations (and possibly more in
the future), which seems a little messy. So I think it would be better
to keep documenting all the random functions together in the same
table.

Regards,
Dean

#17Greg Sabino Mullane
greg@turnstep.com
In reply to: Damien Clochard (#15)
Re: [PATCH] Generate random dates/times in a specified range

Patch looks good

#18Chao Li
li.evan.chao@gmail.com
In reply to: Damien Clochard (#15)
Re: [PATCH] Generate random dates/times in a specified range

On Aug 26, 2025, at 22:17, Damien Clochard <damien@dalibo.info> wrote:

Le 25.08.2025 15:33, Greg Sabino Mullane a écrit :

Your v3 did not get attached to the previous email.

My bad, here it is

--
Damien Clochard<v3-0001-Generate-random-dates-times-in-a-specified-range.patch>

Code change looks good to me. Maybe we can make check_range_boundaries() “inline” as well.

But a major problem is, I think we should bump CATALOG_VERSION_NO. Otherwise, running your code with an existing database, the new functions won’t work.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#19Vik Fearing
vik@postgresfriends.org
In reply to: Chao Li (#18)
Re: [PATCH] Generate random dates/times in a specified range

On 29/08/2025 11:12, Chao Li wrote:

On Aug 26, 2025, at 22:17, Damien Clochard <damien@dalibo.info> wrote:

Le 25.08.2025 15:33, Greg Sabino Mullane a écrit :

Your v3 did not get attached to the previous email.

My bad, here it is

--
Damien
Clochard<v3-0001-Generate-random-dates-times-in-a-specified-range.patch>

Code change looks good to me. Maybe we can make
check_range_boundaries() “inline” as well.

But a major problem is, I think we should bump CATALOG_VERSION_NO.
Otherwise, running your code with an existing database, the new
functions won’t work.

Traditionally, the patch committer bumps the catversion, not the patch
author.

--

Vik Fearing

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vik Fearing (#19)
Re: [PATCH] Generate random dates/times in a specified range

Vik Fearing <vik@postgresfriends.org> writes:

On 29/08/2025 11:12, Chao Li wrote:

But a major problem is, I think we should bump CATALOG_VERSION_NO.

Traditionally, the patch committer bumps the catversion, not the patch
author.

Yeah. If you include a catversion bump in a submitted patch, you can
expect the patch to break repeatedly while it's sitting in the queue,
due to unrelated patches changing catversion. So we prefer to have
the committer add that at the last moment.

If you're worried that the committer might forget that, you can
add a comment about it in the patch's draft commit message.

regards, tom lane

#21Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Damien Clochard (#15)
#22Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Dean Rasheed (#21)
#23Marcos Pegoraro
marcos@f10.com.br
In reply to: Dean Rasheed (#22)
#24Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Marcos Pegoraro (#23)
#25Marcos Pegoraro
marcos@f10.com.br
In reply to: Dean Rasheed (#24)
#26Damien Clochard
damien@dalibo.info
In reply to: Dean Rasheed (#22)