random() function produces wrong range
The comment in the random() function indicates that its author thought
it'd produce output in the range 0..1, which seems like a pretty
reasonable definition:
/* result 0.0-1.0 */
result = ((double) random()) / RAND_MAX;
Unfortunately, at least on my box, it produces no such thing. random()
actually yields values in the range 0..2^31-1 --- while RAND_MAX is
only 32767, because it applies to the rand() function not random().
So what I actually get is floating-point output in the range 0..65535.
regression=# select random();
random
------------------
35771.3981139561
(1 row)
regression=# select random();
random
------------------
58647.5821405683
(1 row)
This is, to say the least, a bizarre definition.
I would like to propose changing the code to
/* result 0.0-1.0 */
result = ((double) random()) / INT_MAX;
(and making the corresponding change in setseed()). But I wonder if
anyone out there has applications that depend on the current behavior.
As far as I can find, random() isn't mentioned in the documentation
currently, so there probably aren't many people using it...
regards, tom lane
On Tue, 1 Aug 2000, Tom Lane wrote:
The comment in the random() function indicates that its author thought
it'd produce output in the range 0..1, which seems like a pretty
reasonable definition:/* result 0.0-1.0 */
result = ((double) random()) / RAND_MAX;Unfortunately, at least on my box, it produces no such thing. random()
actually yields values in the range 0..2^31-1 --- while RAND_MAX is
only 32767, because it applies to the rand() function not random().
I would like to propose changing the code to
/* result 0.0-1.0 */
result = ((double) random()) / INT_MAX;(and making the corresponding change in setseed()). But I wonder if
anyone out there has applications that depend on the current behavior.
Actually, on my machines, both man pages for rand() and random() say
they return values between 0 and RAND_MAX (whether that's true or not
is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't
be a problem, but it might be problematic on some of the 64 bit machines.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
Actually, on my machines, both man pages for rand() and random() say
they return values between 0 and RAND_MAX (whether that's true or not
is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't
be a problem, but it might be problematic on some of the 64 bit machines.
Oh, that's interesting. What platform do you use? If RAND_MAX applies
to random() on some machines that'd probably explain why the code is
written like it is. But on my box (HPUX) the rand() function is old
and crufty and considerably different from random().
regards, tom lane
"Mike Sears" <msears@vianet.ca> writes:
What build of postgres is this running on.
I think random() is new in 7.0.
regards, tom lane
Import Notes
Reply to msg id not found: 002b01bffc07907fd2d002945bd1@neutrino
On Tue, 1 Aug 2000, Tom Lane wrote:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
Actually, on my machines, both man pages for rand() and random() say
they return values between 0 and RAND_MAX (whether that's true or not
is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't
be a problem, but it might be problematic on some of the 64 bit machines.Oh, that's interesting. What platform do you use? If RAND_MAX applies
to random() on some machines that'd probably explain why the code is
written like it is. But on my box (HPUX) the rand() function is old
and crufty and considerably different from random().
That's from a pair of linux boxes, although checking on a FreeBSD box a
friend has, his boxes man pages show the range as explicitly 0 to 2^31-1
as your box does.
Stephan Szabo wrote:
On Tue, 1 Aug 2000, Tom Lane wrote:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
Actually, on my machines, both man pages for rand() and random() say
they return values between 0 and RAND_MAX (whether that's true or not
is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't
be a problem, but it might be problematic on some of the 64 bit machines.Oh, that's interesting. What platform do you use? If RAND_MAX applies
to random() on some machines that'd probably explain why the code is
written like it is. But on my box (HPUX) the rand() function is old
and crufty and considerably different from random().That's from a pair of linux boxes, although checking on a FreeBSD box a
friend has, his boxes man pages show the range as explicitly 0 to 2^31-1
as your box does.
On my SCO 5.0.4 box, rand() from the man page...
The rand function uses a multiplicative congruential random-number
generator
with period 2^32 that returns successive pseudo-random numbers in the
range
from 0 to (2^15)-1.
The following functions define the semantics of the functions rand and
srand.
static unsigned long int next = 1;
int rand()
{
next = next * 1103515245 + 12345;
return ((unsigned int)(next/65536) % 32768);
}
void srand(seed)
unsigned int seed;
{
next = seed;
}
--
Dave Smith
Candata Systems Ltd.
(416) 493-9020
dave@candata.com
What build of postgres is this running on. I've tried this on 6x and it
doesn't seem to work.
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Stephan Szabo" <sszabo@megazone23.bigpanda.com>
Cc: <pgsql-hackers@postgreSQL.org>; <pgsql-general@postgreSQL.org>
Sent: Tuesday, August 01, 2000 11:23 AM
Subject: Re: [GENERAL] random() function produces wrong range
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
Actually, on my machines, both man pages for rand() and random() say
they return values between 0 and RAND_MAX (whether that's true or not
is another matter). In my case RAND_MAX==INT_MAX so the change wouldn't
be a problem, but it might be problematic on some of the 64 bit
machines.
Show quoted text
Oh, that's interesting. What platform do you use? If RAND_MAX applies
to random() on some machines that'd probably explain why the code is
written like it is. But on my box (HPUX) the rand() function is old
and crufty and considerably different from random().regards, tom lane
The comment in the random() function indicates that its author thought
it'd produce output in the range 0..1, which seems like a pretty
reasonable definition:
Unfortunately, at least on my box, it produces no such thing. random()
actually yields values in the range 0..2^31-1 --- while RAND_MAX is
only 32767, because it applies to the rand() function not random().
So what I actually get is floating-point output in the range 0..65535.
This is, to say the least, a bizarre definition.
Or, a bizarre machine. Linux (where I did the testing) produces the
expected result.
I would like to propose changing the code to
/* result 0.0-1.0 */
result = ((double) random()) / INT_MAX;
Erk...
Actually, I depend on the behavior being as advertised, which is what
you would have expected. This is true on Linux boxes from RedHat-5.2 to
Mandrake-7.1. Not sure why your machine is different, but if there is a
more portable way to define this let's find it. Otherwise, get used to
#ifdef HPUX ;)
The Linux man pages indicate that the behavior and underlying
implementation of random() and rand() are the same (so I just picked
one). Would it be better to try using rand() instead?
- Thomas
Hope this function won't be removed. I'm using it
to select a random row like "select xxx from yyy
order by random() limit 1". It's painful to do it
before 7.0.
Regards,
--
Guo Bin
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
As far as I can find, random() isn't mentioned in the
documentation
currently, so there probably aren't many people using
it...regards, tom lane
__________________________________________________
Do You Yahoo!?
Kick off your party with Yahoo! Invites.
http://invites.yahoo.com/
Import Notes
Resolved by subject fallback
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
The Linux man pages indicate that the behavior and underlying
implementation of random() and rand() are the same (so I just picked
one).
Ah, well, there's your problem. Whoever did this part of the library
on Linux took shortcuts. On older-line systems, rand() is a
considerably older and crummier generator than random(). It would
definitely not be a wise decision to use rand() instead.
It appears that on SysV-heritage machines, rand() delivers 15-bit
results (which is what I'm getting) whereas on BSD-heritage platforms
it produces 31-bit results. But even the BSD machines say
The spectral properties of rand() leave a great deal to be
desired. drand48(3) and random(3) provide much better,
though more elaborate, random-number generators.
(quote from SunOS 4.1 man page for rand()).
I believe using random() is the right thing. The portability bug here
is the assumption that RAND_MAX applies to random() (or is even defined;
none of the man pages I've looked at so far mention it). But all the
machines say that the output of random() is 31 bits, so INT_MAX should
work.
regards, tom lane
Tom Lane writes:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
The Linux man pages indicate that the behavior and underlying
implementation of random() and rand() are the same (so I just picked
one).Ah, well, there's your problem. Whoever did this part of the library
on Linux took shortcuts.
Why? Linux is compliant with the Single Unix Standard v2 from my brief
reading of it. The only fragile part is the man page for random(3)
which on Linux says "range from 0 to RAND_MAX" whereas SuSv2 says
"range from 0 to 2^31-1". Since RAND_MAX on Linux is actually 2^31-1
anyway, it is still correct albeit misleading. Documentation bug, say.
I believe using random() is the right thing. The portability bug here
is the assumption that RAND_MAX applies to random() (or is even defined;
none of the man pages I've looked at so far mention it). But all the
machines say that the output of random() is 31 bits, so INT_MAX should
work.
SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
be non-portable to platforms with 64-bit ints, for example.
--Malcolm
--
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services
Malcolm Beattie <mbeattie@sable.ox.ac.uk> writes:
Tom Lane writes:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
The Linux man pages indicate that the behavior and underlying
implementation of random() and rand() are the same (so I just picked
one).Ah, well, there's your problem. Whoever did this part of the library
on Linux took shortcuts.
Why? Linux is compliant with the Single Unix Standard v2 from my brief
reading of it.
Sorry, bad choice of words on my part. Linux is within its rights to
use the same underlying implementation for rand() and random(), but it
is a poor guide to the behavior of other systems, which are equally
within their rights not to.
none of the man pages I've looked at so far mention it). But all the
machines say that the output of random() is 31 bits, so INT_MAX should
work.
SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
be non-portable to platforms with 64-bit ints, for example.
Maybe. You don't think that a 64-bit-int platform would choose to
supply a random() function with a range of 2^63-1? The HPUX and SunOS
man pages clearly specify that random()'s result is "long", so I think
a case could also be made for LONG_MAX.
I suspect we have a good chance at getting burned no matter what we use
:-(. But RAND_MAX is definitely the wrong thing.
regards, tom lane
none of the man pages I've looked at so far mention it). But all the
machines say that the output of random() is 31 bits, so INT_MAX should
work.SuSv2 says explicitly 2^31-1 so you should use that, otherwise you'll
be non-portable to platforms with 64-bit ints, for example.Maybe. You don't think that a 64-bit-int platform would choose to
supply a random() function with a range of 2^63-1? The HPUX and SunOS
man pages clearly specify that random()'s result is "long", so I think
a case could also be made for LONG_MAX.I suspect we have a good chance at getting burned no matter what we use
:-(. But RAND_MAX is definitely the wrong thing.
Is it possible to test (during configure phase) and then go from there...
or does it need to be the same for all platforms?
-
- Thomas Swan
- Graduate Student - Computer Science
- The University of Mississippi
-
- "People can be categorized into two fundamental
- groups, those that divide people into two groups
- and those that don't."
Thomas Swan <tswan@olemiss.edu> writes:
I suspect we have a good chance at getting burned no matter what we use
:-(. But RAND_MAX is definitely the wrong thing.
Is it possible to test (during configure phase) and then go from there...
or does it need to be the same for all platforms?
I thought about that last night. We could do a configure test. Since
it'd be probing random() results there'd be a small probability of
failure, but if we wire in an assumption that the max value must be
2^(15 + n*16)-1 for some n, ten or so probes would give us a failure
probability on the order of 2^-160, which ought to satisfy anyone.
However, in the absence of any hard evidence that there are platforms
where the value is different from 2^31-1, it's probably just a waste of
configuration cycles at the moment.
I suggest we add a config.h constant like
/* The local random() function yields values 0 .. MAX_RANDOM_VALUE */
#define MAX_RANDOM_VALUE <2^31-1>
and use that in the code. Then, if we ever find a platform where
random() does actually produce 64-bit results, it'll be time enough
to crank up a real configure test to set the value.
Comments?
regards, tom lane
-----BEGIN PGP SIGNED MESSAGE-----
"Thomas" == Thomas Swan <tswan@olemiss.edu> writes:
Thomas> Is it possible to test (during configure phase) and then
Thomas> go from there... or does it need to be the same for all
Thomas> platforms?
You should be able to test, but it is, of course, a statistical test.
Call random() several times and test the maximum value against your
thresholds of 2^15 and 2^31. If random() is generating values in the
range 1:2^31-1, you would expect half of your values to be greater
than 2^15-1; more importantly, if you generate, say, 10 values, you
expect only a 1:1024 chance that they are all below 2^15. If those
odds aren't good enough, generate more.
roland
- --
PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD Unix Software Solutions
roberts@panix.com 76-15 113th Street, Apt 3B
rbroberts@acm.org Forest Hills, NY 11375
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface
iQCVAwUBOYmNSeoW38lmvDvNAQHFegQAlexEvaG0t+1H0IkPWikbdMUIck1fE0DI
rfcGi1M/SQ6K9Hmvi1HB/SVEU4DKGaHDoqlU7ei78OgOzchbsLL5cqAJNIsKv5QJ
F4u/A0Fg7yuyRZ3/CNCo0+nTwhyDANktMw78AM5ssHCs75UxC+vHWibHHBmXJzrF
8WD2LyjPSNI=
=dR25
-----END PGP SIGNATURE-----
Import Notes
Reply to msg id not found: ThomasSwansmessageofWed02Aug2000222307-0500
Tom Lane writes:
I thought about that last night. We could do a configure test. Since
it'd be probing random() results there'd be a small probability of
failure, but if we wire in an assumption that the max value must be
2^(15 + n*16)-1 for some n, ten or so probes would give us a failure
probability on the order of 2^-160, which ought to satisfy anyone.However, in the absence of any hard evidence that there are platforms
where the value is different from 2^31-1, it's probably just a waste of
configuration cycles at the moment.I suggest we add a config.h constant like
/* The local random() function yields values 0 .. MAX_RANDOM_VALUE */
#define MAX_RANDOM_VALUE <2^31-1>and use that in the code. Then, if we ever find a platform where
random() does actually produce 64-bit results, it'll be time enough
to crank up a real configure test to set the value.Comments?
If any platform *does* produced 64-bit results, it won't be compliant
with SUSv2 which states explicitly that the resulting range is up to
2^31-1. Since most portability problems are with older platforms which
haven't caught up, I'd be hopeful that any new 64-bit-int platforms
would get it right from the outset. Maybe I'm being over-optimistic :-)
--Malcolm
--
Malcolm Beattie <mbeattie@sable.ox.ac.uk> I am looking for a Linux (and
Unix Systems Programmer maybe Apache/mod_perl) job/contract
Oxford University Computing Services http://users.ox.ac.uk/~mbeattie/cv.html
Roland Roberts <roberts@panix.com> writes:
Call random() several times and test the maximum value against your
thresholds of 2^15 and 2^31. If random() is generating values in the
range 1:2^31-1, you would expect half of your values to be greater
than 2^15-1; more importantly, if you generate, say, 10 values, you
expect only a 1:1024 chance that they are all below 2^15.
Actually the odds are far better than that. If the range is 2^31-1
then only about 2^-16th of the outputs should be less than 2^15.
So ten probes gives you a failure probability of about 2^-160 not
2^-10.
Generalizing, you could tell the difference between widths of 31,
47, or 63 bits with the same level of reliability.
regards, tom lane
-----BEGIN PGP SIGNED MESSAGE-----
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Actually the odds are far better than that. If the range is
Tom> 2^31-1 then only about 2^-16th of the outputs should be less
Tom> than 2^15. So ten probes gives you a failure probability of
Tom> about 2^-160 not 2^-10.
Oops, 2^16 != 2^32 / 2.
So a dynamic test is not only possible but wouldn't cost to much at
configure time.
roland
- --
PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD Unix Software Solutions
roberts@panix.com 76-15 113th Street, Apt 3B
rbroberts@acm.org Forest Hills, NY 11375
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface
iQCVAwUBOYmtf+oW38lmvDvNAQGWYwP/eXRtrDPu/xN+W9pCd9y34d4jbrPH7jku
nBAuSYtCRyoMgTkjdCtqThzq3vzPLDwfmOZcmWP8W5AmQPJjvcdFwI7y1XgGlaxd
aAIlqqf+TTkZwIUh2vnWTuu5JKkiAZI6UuzNSzy79O/frxKE2y97zCuMw02I0kMK
iGNSybN3L5w=
=36yP
-----END PGP SIGNATURE-----
Import Notes
Reply to msg id not found: TomLanesmessageofThu03Aug2000114539-0400
Tom Lane <tgl@sss.phg.pa.us> writes:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
The Linux man pages indicate that the behavior and underlying
implementation of random() and rand() are the same (so I just picked
one).Ah, well, there's your problem. Whoever did this part of the library
on Linux took shortcuts. On older-line systems, rand() is a
considerably older and crummier generator than random(). It would
definitely not be a wise decision to use rand() instead.I believe using random() is the right thing. The portability bug here
is the assumption that RAND_MAX applies to random() (or is even defined;
none of the man pages I've looked at so far mention it). But all the
machines say that the output of random() is 31 bits, so INT_MAX should
work.
On an i386 machine, certainly; but not on an Alpha or a Sparc.
Probably safer to use (2**31)-1, which is what my (NetBSD) man page
says.
FWIW, I believe random(3) running on NetBSD/alpha, for example, will
return a 31-bit result.
Chris
--
---------------------------------------------------- cjones@rightnowtech.com
Chris Jones
System Administrator, RightNow Technologies
"Is this going to be a stand-up programming session, sir, or another bug hunt?"
On Thu, Aug 03, 2000 at 11:45:39AM -0400, Tom Lane wrote:
Actually the odds are far better than that. If the range is 2^31-1
then only about 2^-16th of the outputs should be less than 2^15.
So ten probes gives you a failure probability of about 2^-160 not
2^-10.
It occurs to me that Perl has to provide a portable rand function, so
I looked at how its Configure script works. It's pretty much what
you've been discussing. First it checks for a couple of possible
random functions (preferring drand48(), then random(), then bitching
and using rand()).
int main()
{
register int i;
register unsigned long tmp;
register unsigned long max = 0L;
for (i = 1000; i; i--) {
tmp = (unsigned long) $randfunc();
if (tmp > max) max = tmp;
}
for (i = 0; max; i++)
max /= 2;
printf("%d\n",i);
}
Oh well.
--
Christopher Masto Senior Network Monkey NetMonger Communications
chris@netmonger.net info@netmonger.net http://www.netmonger.net
Free yourself, free your machine, free the daemon -- http://www.freebsd.org/
<html><div>Perhaps a little off topic.</div>
<br>
<div>I had been using int8 as primary / foreign keys pairs.
After joining several large tables on the keys, I noticed about a 30/40%
speed improvement just in changing the keys from int8 to int4 data
types.</div>
<br>
Should it affect it that much?
<br>
- <br>
- <b><u>Thomas Swan</b>
<x-tab> </x-tab><x-tab> </x-tab><x-tab> </x-tab><x-tab> </x-tab><x-tab> </x-tab><br>
</u>- Graduate Student - Computer Science<br>
- The University of Mississippi<br>
- <br>
- "People can be categorized into two fundamental <br>
- groups, those that divide people into two groups <br>
- and those that don't."</html>
Tom Lane writes:
Oh, that's interesting. What platform do you use? If RAND_MAX applies
to random() on some machines that'd probably explain why the code is
written like it is. But on my box (HPUX) the rand() function is old
and crufty and considerably different from random().
rand() and RAND_MAX are defined by ANSI C, random() is a BSD-ism. I
suggest you use the former. Also, while you're at it, this is a snippet
from the C FAQ:
13.16: How can I get random integers in a certain range?
A: The obvious way,
rand() % N /* POOR */
(which tries to return numbers from 0 to N-1) is poor, because
the low-order bits of many random number generators are
distressingly *non*-random. (See question 13.18.) A better
method is something like
(int)((double)rand() / ((double)RAND_MAX + 1) * N)
If you're worried about using floating point, you could use
rand() / (RAND_MAX / N + 1)
Both methods obviously require knowing RAND_MAX (which ANSI
#defines in <stdlib.h>), and assume that N is much less than
RAND_MAX.
(Note, by the way, that RAND_MAX is a *constant* telling you
what the fixed range of the C library rand() function is. You
cannot set RAND_MAX to some other value, and there is no way of
requesting that rand() return numbers in some other range.)
If you're starting with a random number generator which returns
floating-point values between 0 and 1, all you have to do to get
integers from 0 to N-1 is multiply the output of that generator
by N.
References: K&R2 Sec. 7.8.7 p. 168; PCS Sec. 11 p. 172.
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
Import Notes
Resolved by subject fallback
I had been using int8 as primary / foreign keys pairs. After joining
several large tables on the keys, I noticed about a 30/40% speed
improvement just in changing the keys from int8 to int4 data types.
Apparently it should be that much (though this is the first report
either way). Here are some reasons why there is the difference:
1) int4 is "pass by value", int8 is "pass by reference". So int8 hits
palloc() every time you generate a new one, whereas int4 may just copy
the data itself.
2) int8 is implemented in software libraries, at least on 32-bit
machines. int4 is implemented in hardware. Libraries are slower than
~1cycle hardware operations.
- Thomas
Peter Eisentraut <peter_e@gmx.net> writes:
rand() and RAND_MAX are defined by ANSI C, random() is a BSD-ism. I
suggest you use the former.
Unfortunately, except on a few platforms like Linux, the typical
rand() implementation is vastly inferior to the typical random()
implementation. BSD wouldn't have bothered to roll their own if
the older code hadn't been so godawful. But unless you are using
glibc, you probably have a bug-compatible rand().
regards, tom lane