random generated string matching index in inexplicable ways

Started by Myles Milleralmost 7 years ago5 messagesgeneral
Jump to latest
#1Myles Miller
pg@q7r7.com

PROBLEM:
Strings or characters generated by any random function (including pg_crypto
gen_random_bytes) are matching a string/char index in surprising ways.

Reduced down to its simplest example:

-- use random 0 or 1, plus 65, to get 'A' or 'B'

# SELECT chr(round(random())::int + 65);
chr
-----
B
(1 row)

# SELECT chr(round(random())::int + 65);
chr
-----
A
(1 row)

-- simple table for matching:
CREATE TABLE x( y char(1) primary key );
INSERT INTO x(y) VALUES ('A');
INSERT INTO x(y) VALUES ('B');

-- if I query 'A' or 'B' it works as expected

# SELECT y FROM x WHERE y = 'A';
y
---
A
(1 row)

# SELECT y FROM x WHERE y = 'B';
y
---
B
(1 row)

-- if we use random-generated 'A' or 'B', things get inexplicable

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y
---
A
B
(2 rows)

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y
---
(0 rows)

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y
---
B
(1 row)

I've been wrestling with this for hours. Any suggestions? Thank you.

#2Alban Hertroys
haramrae@gmail.com
In reply to: Myles Miller (#1)
Re: random generated string matching index in inexplicable ways

On 7 May 2019, at 13:53, Myles Miller <pg@q7r7.com> wrote:

PROBLEM:
Strings or characters generated by any random function (including pg_crypto
gen_random_bytes) are matching a string/char index in surprising ways.

I fail to see anything surprising in your examples.

Reduced down to its simplest example:

(…)

-- if we use random-generated 'A' or 'B', things get inexplicable

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y
---
A
B
(2 rows)

Here you got a random value in the lower range of 0..1 for the record with value ‘A’, so that’s a match, and one in the higher range for value ‘B’, a match again, so you get 2 rows.

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y
---
(0 rows)

Here you got a random value in the higher range for the record with value ‘A’, so no match, and one in the lower range for value ‘B’, no match again, so you get 0 rows.

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y
---
B
(1 row)

Here you got two random values in the higher range, so only the row with ‘B’ matches.

You could also get two random values in the lower range and only get a match against ‘A’.

Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

#3Myles Miller
pg@q7r7.com
In reply to: Alban Hertroys (#2)
Re: random generated string matching index in inexplicable ways

# SELECT y FROM x WHERE y = chr(round(random())::int + 65);
y
---
A
B
(2 rows)

Here you got a random value in the lower range of 0..1 for the record with value ‘A’, so that’s a match, and one in the higher range for value ‘B’, a match again, so you get 2 rows.

No, the function is returning just one letter, either 'A' or 'B', not multiple values.

# SELECT * FROM chr(round(random())::int + 65);
chr
-----
B
(1 row)

# SELECT * FROM chr(round(random())::int + 65);
chr
-----
A
(1 row)

#4Myles Miller
pg@q7r7.com
In reply to: Myles Miller (#1)
Re: random generated string matching index in inexplicable ways [EXT]

On Tue, May 07, 2019 at 12:17:12PM +0000, Daniel Perrett wrote:

The WHERE expression (and therefore the random function) is being evaluated once per row, not once per query.
If you run this query a few times, you will see the problem:
SELECT y, chr(round(random())::int + 65) FROM (SELECT 'A' UNION SELECT 'B') x(y);

Ah! Thank you! I understand now.

#5Francisco Olarte
folarte@peoplecall.com
In reply to: Myles Miller (#3)
Re: random generated string matching index in inexplicable ways

On Tue, May 7, 2019 at 3:12 PM Myles Miller <pg@q7r7.com> wrote:

No, the function is returning just one letter, either 'A' or 'B', not multiple values.

Your random function is being evaluated ONCE FOR EACH ROW.

i.e, it's doing something like:
select y from
( SELECT y, chr(round(random())::int + 65) as z FROM x ) aux
WHERE y = z;
o
with aux as ( SELECT y, chr(round(random())::int + 65) as z FROM x )
select y from aux WHERE y = z;

You may want to try something like this:

with aux as ( SELECT chr(round(random())::int + 65) as z )
select y from x,aux WHERE y = z;

( Untested, but that's the idea )

Francisco Olarte