surprising results with random()
Hi,
I have a view in which I want to randomly assign values if certain
conditions hold. I was getting surprising results. Here is a (very)
simplified version of the view, which seems to indicate the problem:
CREATE OR REPLACE VIEW test_view AS
SELECT
CASE
WHEN random() < .3333 THEN '1'
WHEN random() < .3333 THEN '2'
ELSE '3'
END AS test_value
FROM client ;
It seems this should generate a random number between 0 and 1, and set
test_value to '1' if this first generated number is less than .3333.
Otherwise, it should generate another random number, and set test_value
to '2' if this is less than .3333. And if neither of the random numbers
are less than .3333, it should set test_value to '3'. It seems to me
that there should be a relative even distribution of the 3 values.
However when I run this, the values are always similar to what is below:
X_test=> select test_value, count(*) from test_view group by 1 order by 1;
test_value | count
------------+-------
1 | 23947
2 | 16061
3 | 32443
Why are there significantly fewer 2s? I understand that random() is not
truly random, and that the seed affects this value. But it still
confuses me that, no matter how many times I run this, there are always
so few 2s. If it is generating an independent random number in the
second call to random(), then I don't know why there are more so many
more 1s than 2s.
Thanks!
-jessi
--
Jessi Berkelhammer
Downtown Emergency Service Center
Computer Programming Specialist
On Feb 23, 2009, at 2:09 PM, Jessi Berkelhammer wrote:
Hi,
I have a view in which I want to randomly assign values if certain
conditions hold. I was getting surprising results. Here is a (very)
simplified version of the view, which seems to indicate the problem:CREATE OR REPLACE VIEW test_view AS
SELECT
CASE
WHEN random() < .3333 THEN '1'
WHEN random() < .3333 THEN '2'
ELSE '3'
END AS test_valueFROM client ;
It seems this should generate a random number between 0 and 1, and set
test_value to '1' if this first generated number is less than .3333.
Otherwise, it should generate another random number, and set
test_value
to '2' if this is less than .3333. And if neither of the random
numbers
are less than .3333, it should set test_value to '3'. It seems to me
that there should be a relative even distribution of the 3 values.However when I run this, the values are always similar to what is
below:X_test=> select test_value, count(*) from test_view group by 1
order by 1;
test_value | count
------------+-------
1 | 23947
2 | 16061
3 | 32443Why are there significantly fewer 2s? I understand that random() is
not
truly random, and that the seed affects this value. But it still
confuses me that, no matter how many times I run this, there are
always
so few 2s. If it is generating an independent random number in the
second call to random(), then I don't know why there are more so many
more 1s than 2s.
Nope, it's nothing to do with random(), it's that your maths is wrong.
There are 9 possible cases. In 3 of them you return 1. In 2 of them you
return 2. In the remaining 4 cases you return 3.
If you were to run this 72451 times I'd expect to see
1: 24150 = 72451 * 3/9
2: 16100 = 72451 * 2/9
3: 32200 = 72451 * 4/9
Which, unsurprisingly, is fairly close to what you get.
Cheers,
Steve
Jessi,
should the function not look like this???
CREATE OR REPLACE VIEW test_view AS
SELECT
CASE
WHEN random() < .333333333 THEN '1'
WHEN random() < .5 THEN '2'
ELSE '3'
END AS test_value
FROM client;
On Feb 23, 2009, at 5:09 PM, Jessi Berkelhammer wrote:
Show quoted text
Hi,
I have a view in which I want to randomly assign values if certain
conditions hold. I was getting surprising results. Here is a (very)
simplified version of the view, which seems to indicate the problem:CREATE OR REPLACE VIEW test_view AS
SELECT
CASE
WHEN random() < .3333 THEN '1'
WHEN random() < .3333 THEN '2'
ELSE '3'
END AS test_valueFROM client ;
It seems this should generate a random number between 0 and 1, and set
test_value to '1' if this first generated number is less than .3333.
Otherwise, it should generate another random number, and set
test_value
to '2' if this is less than .3333. And if neither of the random
numbers
are less than .3333, it should set test_value to '3'. It seems to me
that there should be a relative even distribution of the 3 values.However when I run this, the values are always similar to what is
below:X_test=> select test_value, count(*) from test_view group by 1
order by 1;
test_value | count
------------+-------
1 | 23947
2 | 16061
3 | 32443Why are there significantly fewer 2s? I understand that random() is
not
truly random, and that the seed affects this value. But it still
confuses me that, no matter how many times I run this, there are
always
so few 2s. If it is generating an independent random number in the
second call to random(), then I don't know why there are more so many
more 1s than 2s.Thanks!
-jessi--
Jessi Berkelhammer
Downtown Emergency Service Center
Computer Programming Specialist
Or perhaps:
CREATE OR REPLACE VIEW test_view AS
SELECT (random()*3)::int as test_value;
At least in this case, that should give the same result.
in this case 1/3 should be 1, 1/3 = 2 & 1/3=3
in your case 1/3 = 1, 1/2 the remainder (1/2 * 2/3 = 1/3) = 2, remaining 1/3 = 3
Although I'm guessing the original intent is to NOT generate an equal distribution, but I'm not sure what distribution is required.
Cheers,
Brent Wood
Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
ries van Twisk <pg@rvt.dds.nl> 02/24/09 12:13 PM >>>
Jessi,
should the function not look like this???
CREATE OR REPLACE VIEW test_view AS
SELECT
CASE
WHEN random() < .333333333 THEN '1'
WHEN random() < .5 THEN '2'
ELSE '3'
END AS test_value
FROM client;
On Feb 23, 2009, at 5:09 PM, Jessi Berkelhammer wrote:
Hi,
I have a view in which I want to randomly assign values if certain
conditions hold. I was getting surprising results. Here is a (very)
simplified version of the view, which seems to indicate the problem:CREATE OR REPLACE VIEW test_view AS
SELECT
CASE
WHEN random() < .3333 THEN '1'
WHEN random() < .3333 THEN '2'
ELSE '3'
END AS test_valueFROM client ;
It seems this should generate a random number between 0 and 1, and set
test_value to '1' if this first generated number is less than .3333.
Otherwise, it should generate another random number, and set
test_value
to '2' if this is less than .3333. And if neither of the random
numbers
are less than .3333, it should set test_value to '3'. It seems to me
that there should be a relative even distribution of the 3 values.However when I run this, the values are always similar to what is
below:X_test=> select test_value, count(*) from test_view group by 1
order by 1;
test_value | count
------------+-------
1 | 23947
2 | 16061
3 | 32443Why are there significantly fewer 2s? I understand that random() is
not
truly random, and that the seed affects this value. But it still
confuses me that, no matter how many times I run this, there are
always
so few 2s. If it is generating an independent random number in the
second call to random(), then I don't know why there are more so many
more 1s than 2s.Thanks!
-jessi--
Jessi Berkelhammer
Downtown Emergency Service Center
Computer Programming Specialist
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.
Import Notes
Resolved by subject fallback
Steve Atkins wrote:
On Feb 23, 2009, at 2:09 PM, Jessi Berkelhammer wrote:
Hi,
I have a view in which I want to randomly assign values if certain
conditions hold. I was getting surprising results. Here is a (very)
simplified version of the view, which seems to indicate the problem:CREATE OR REPLACE VIEW test_view AS
SELECT
CASE
WHEN random() < .3333 THEN '1'
WHEN random() < .3333 THEN '2'
ELSE '3'
END AS test_valueFROM client ;
It seems this should generate a random number between 0 and 1, and set
test_value to '1' if this first generated number is less than .3333.
Otherwise, it should generate another random number, and set
test_value
to '2' if this is less than .3333. And if neither of the random
numbers
are less than .3333, it should set test_value to '3'. It seems to me
that there should be a relative even distribution of the 3 values.However when I run this, the values are always similar to what is
below:X_test=> select test_value, count(*) from test_view group by 1
order by 1;
test_value | count
------------+-------
1 | 23947
2 | 16061
3 | 32443Why are there significantly fewer 2s? I understand that random() is
not
truly random, and that the seed affects this value. But it still
confuses me that, no matter how many times I run this, there are
always
so few 2s. If it is generating an independent random number in the
second call to random(), then I don't know why there are more so many
more 1s than 2s.Nope, it's nothing to do with random(), it's that your maths is wrong.
There are 9 possible cases. In 3 of them you return 1. In 2 of them you
return 2. In the remaining 4 cases you return 3.If you were to run this 72451 times I'd expect to see
1: 24150 = 72451 * 3/9
2: 16100 = 72451 * 2/9
3: 32200 = 72451 * 4/9Which, unsurprisingly, is fairly close to what you get.
Cheers,
Steve
this looks like an attempt to understand the monty hall problem.
http://en.wikipedia.org/wiki/Monty_Hall_problem
except that there's no goat :(
cheers,
raf
On Mon, Feb 23, 2009 at 5:16 PM, raf <raf@raf.org> wrote:
this looks like an attempt to understand the monty hall problem.
http://en.wikipedia.org/wiki/Monty_Hall_problem
except that there's no goat :(
And what database management system can be complete without a goat? :)