random() function documentation

Started by Dagfinn Ilmari Mannsåkeralmost 4 years ago9 messages

Hi Hackers,

I just noticed that the since the random() rewrite¹, the documentation's
claim² that it "uses a simple linear congruential algorithm" is no
longer accurate (xoroshiro128** is an xorshift variant, which is a
linear-feedback shift register algorithm).

I don't have a suggestion for the exact wording, since I don't know
whether xoroshiro128** qualifies as "simple", or to what level of
specificity we want to document the algorithm.

- ilmari

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3804539e48e794781c6145c7f988f5d507418fa8
[2]: https://www.postgresql.org/docs/devel/functions-math.html#FUNCTIONS-MATH-RANDOM-TABLE

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dagfinn Ilmari Mannsåker (#1)
Re: random() function documentation

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:

I just noticed that the since the random() rewrite¹, the documentation's
claim² that it "uses a simple linear congruential algorithm" is no
longer accurate (xoroshiro128** is an xorshift variant, which is a
linear-feedback shift register algorithm).

I don't have a suggestion for the exact wording, since I don't know
whether xoroshiro128** qualifies as "simple", or to what level of
specificity we want to document the algorithm.

How about we just say "uses a linear-feedback shift register algorithm"?
"Simple" is in the eye of the beholder anyway.

regards, tom lane

In reply to: Tom Lane (#2)
Re: random() function documentation

Tom Lane <tgl@sss.pgh.pa.us> writes:

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:

I just noticed that the since the random() rewrite¹, the documentation's
claim² that it "uses a simple linear congruential algorithm" is no
longer accurate (xoroshiro128** is an xorshift variant, which is a
linear-feedback shift register algorithm).

I don't have a suggestion for the exact wording, since I don't know
whether xoroshiro128** qualifies as "simple", or to what level of
specificity we want to document the algorithm.

How about we just say "uses a linear-feedback shift register algorithm"?

That works for me. Nice and simple, and not overly specific. Should we
perhaps also add a warning that the same seed is not guaranteed to
produce the same sequence across different (major?) versions?

"Simple" is in the eye of the beholder anyway.

Indeed.

regards, tom lane

- ilmari

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dagfinn Ilmari Mannsåker (#3)
Re: random() function documentation

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

How about we just say "uses a linear-feedback shift register algorithm"?

That works for me. Nice and simple, and not overly specific. Should we
perhaps also add a warning that the same seed is not guaranteed to
produce the same sequence across different (major?) versions?

I wouldn't bother, on the grounds that then we'd need such disclaimers
in a whole lot of places. Others might see it differently though.

regards, tom lane

#5Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#4)
Re: random() function documentation

On Mon, 11 Apr 2022 at 20:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

How about we just say "uses a linear-feedback shift register algorithm"?

I think it'd be sufficient to just say that it's a deterministic
pseudorandom number generator. I don't see much value in documenting
the internal algorithm used.

Should we
perhaps also add a warning that the same seed is not guaranteed to
produce the same sequence across different (major?) versions?

I wouldn't bother, on the grounds that then we'd need such disclaimers
in a whole lot of places. Others might see it differently though.

Agreed, though I think when the release notes are written, they ought
to warn that the sequence will change with this release.

Regards,
Dean

#6Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Dean Rasheed (#5)
1 attachment(s)
Re: random() function documentation

How about we just say "uses a linear-feedback shift register algorithm"?

I think it'd be sufficient to just say that it's a deterministic
pseudorandom number generator. I don't see much value in documenting
the internal algorithm used.

Hmmm… I'm not so sure. ISTM that people interested in using the random
user-facing variants (only random?) could like a pointer on the algorithm
to check for the expected quality of the produced pseudo-random stream?

See attached.

Should we perhaps also add a warning that the same seed is not
guaranteed to produce the same sequence across different (major?)
versions?

I wouldn't bother, on the grounds that then we'd need such disclaimers
in a whole lot of places. Others might see it differently though.

Agreed,

Agreed.

though I think when the release notes are written, they ought
to warn that the sequence will change with this release.

Yes.

--
Fabien.

Attachments:

random-algo.patchtext/x-diff; name=random-algo.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0a5c402640..7492454592 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1832,10 +1832,11 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
    </table>
 
   <para>
-   The <function>random()</function> function uses a simple linear
-   congruential algorithm.  It is fast but not suitable for cryptographic
-   applications; see the <xref linkend="pgcrypto"/> module for a more
-   secure alternative.
+   The <function>random()</function> function uses
+   <ulink url="https://en.wikipedia.org/wiki/Xoroshiro128%2B">xoroshiro128**</ulink>, a
+   linear feedback shift register algorithm.
+   It is fast but not suitable for cryptographic applications;
+   see the <xref linkend="pgcrypto"/> module for a more secure alternative.
    If <function>setseed()</function> is called, the series of results of
    subsequent <function>random()</function> calls in the current session
    can be repeated by re-issuing <function>setseed()</function> with the same
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#6)
Re: random() function documentation

Fabien COELHO <coelho@cri.ensmp.fr> writes:

How about we just say "uses a linear-feedback shift register algorithm"?

I think it'd be sufficient to just say that it's a deterministic
pseudorandom number generator. I don't see much value in documenting
the internal algorithm used.

Hmmm… I'm not so sure. ISTM that people interested in using the random
user-facing variants (only random?) could like a pointer on the algorithm
to check for the expected quality of the produced pseudo-random stream?

See attached.

I don't want to get that specific. We were not specific before and
there has been no call for such detail in the docs. (Unlike
closed-source software, anybody who really wants algorithmic details
can find all they want to know in the source code.) It would just
amount to another thing to forget to update next time someone changes
the algorithm ... which is a consideration that leads me to favor
Dean's phrasing.

regards, tom lane

In reply to: Dean Rasheed (#5)
Re: random() function documentation

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Mon, 11 Apr 2022 at 20:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

How about we just say "uses a linear-feedback shift register algorithm"?

I think it'd be sufficient to just say that it's a deterministic
pseudorandom number generator. I don't see much value in documenting
the internal algorithm used.

Should we
perhaps also add a warning that the same seed is not guaranteed to
produce the same sequence across different (major?) versions?

I wouldn't bother, on the grounds that then we'd need such disclaimers
in a whole lot of places. Others might see it differently though.

Agreed, though I think when the release notes are written, they ought
to warn that the sequence will change with this release.

WFM on both points.

Regards,
Dean

- ilmari

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dagfinn Ilmari Mannsåker (#8)
Re: random() function documentation

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

I think it'd be sufficient to just say that it's a deterministic
pseudorandom number generator. I don't see much value in documenting
the internal algorithm used.

WFM on both points.

Sold then, I'll make it so.

regards, tom lane