Strange random() Correlation

Started by Volkan YAZICIalmost 20 years ago3 messagesbugs
Jump to latest
#1Volkan YAZICI
yazicivo@ttnet.net.tr

Hi,

ISTM, there's a problem in the correlation of random() to outer JOINs.
Here's a test case:

BEGIN;

CREATE TEMP TABLE nuc_codes (id serial, code char(1));

COPY nuc_codes (code) FROM stdin;
A
C
D
G
H
K
M
N
R
S
T
U
V
W
X
Y
\.

SELECT id, code FROM nuc_codes;

SELECT T1.r1, T1.r2, T2.code, T3.code
FROM (SELECT ((random() * 100)::int4 % 17),
((random() * 100)::int4 % 17)
FROM generate_series(1, 10)
) AS T1 (r1, r2)
LEFT OUTER JOIN nuc_codes T2 ON (T2.id = T1.r1)
LEFT OUTER JOIN nuc_codes T3 ON (T3.id = T1.r2);

ROLLBACK;

If you run above query, you'll realize the inconsistency in the output.
Furthermore, if you'd append an "OFFSET 0" to subselect, output becomes
more stable but still has some inconsistencies.

Regards.

P.S. Query tested on 8.1.4 and a 2-3 weeks old cvs tip.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Volkan YAZICI (#1)
Re: Strange random() Correlation

Volkan YAZICI <yazicivo@ttnet.net.tr> writes:

ISTM, there's a problem in the correlation of random() to outer JOINs.

The random() functions are being evaluated more than once because the
subselect gets "flattened" into the outer query, so that you have
the equivalent of

select random(), ... where t2.id = random() ...

We've previously discussed preventing the planner from flattening if
there are any volatile functions in the sub-select's output list, but
I think that would probably do about as much harm as good. The cases
where this actually matters are rare and the programmer knows darn well
what he's doing, so the workaround of inserting an OFFSET 0 to prevent
the flattening seems acceptable to me.

Furthermore, if you'd append an "OFFSET 0" to subselect, output becomes
more stable but still has some inconsistencies.

I didn't see any...

regards, tom lane

#3Volkan YAZICI
yazicivo@ttnet.net.tr
In reply to: Tom Lane (#2)
Re: Strange random() Correlation

On May 27 11:50, Tom Lane wrote:

Volkan YAZICI <yazicivo@ttnet.net.tr> writes:

ISTM, there's a problem in the correlation of random() to outer JOINs.

The random() functions are being evaluated more than once because the
subselect gets "flattened" into the outer query, so that you have
the equivalent of

select random(), ... where t2.id = random() ...

Oops, sorry. I've just remembered this.

We've previously discussed preventing the planner from flattening if
there are any volatile functions in the sub-select's output list, but
I think that would probably do about as much harm as good.

It can be quite informative to learn the pros and cons of this issue,
but I couldn't find related discussion in archives. I'd be so
appreciated if you can remember its subject or anything specifier for
the thread.

Furthermore, if you'd append an "OFFSET 0" to subselect, output becomes
more stable but still has some inconsistencies.

I didn't see any...

That's all caused by a mis-interpretation of the output by me. Replacing
"(random() * 100)::int % 17" with "1 + ((random() * 100)::int % 16)"
solved my above question.

Regards.