Default ordering option

Started by Cyril Champierover 6 years ago16 messagesgeneral

cyril.champier@doctolib.com

over 6 years ago

Hi,

In this documentation
<https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:

If sorting is not chosen, the rows will be returned in an unspecified
order. The actual order in that case will depend on the scan and join plan
types and the order on disk, but it must not be relied on.

I would like to know if there is any way to change that to have a "real"
random behaviour.

My use case:
At Doctolib, we do a lot of automatic tests.
Sometimes, people forgot to order their queries. Most of the time, there is
no real problem on production. Let say, we display a user list without
order.
When a developer writes a test for this feature, he will create 2 users A
and B, then assert that they are displayed "[A, B]".
99% of the time the test will be ok, but sometimes, the displayed list will
be "[B,A]", and the test will fail.

One solution could be to ensure random order with an even distribution, so
that such failing test would be detected quicker.

Is that possible? Maybe with a plugin?

Thanks,
Cyril

Adrian Klaver

adrian.klaver@aklaver.com

over 6 years ago

In reply to: Cyril Champier (#1)

Re: Default ordering option

On 7/23/19 8:43 AM, Cyril Champier wrote:

Hi,

In this documentation
<https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:

If sorting is not chosen, the rows will be returned in an
unspecified order. The actual order in that case will depend on the
scan and join plan types and the order on disk, but it must not be
relied on.

I would like to know if there is any way to change that to have a "real"
random behaviour.

My use case:
At Doctolib, we do a lot of automatic tests.
Sometimes, people forgot to order their queries. Most of the time, there
is no real problem on production. Let say, we display a user list
without order.
When a developer writes a test for this feature, he will create 2 users
A and B, then assert that they are displayed "[A, B]".
99% of the time the test will be ok, but sometimes, the displayed list
will be "[B,A]", and the test will fail.

One solution could be to ensure random order with an even distribution,
so that such failing test would be detected quicker.

Is that possible? Maybe with a plugin?

Not that I know of.

A possible solution given below:

test_(postgres)> insert into t1 values (1, 'dog'), (2, 'cat'), (3, 'fish');
INSERT 0 3

test_(postgres)> select * from t1 ;
a | b
---+------
1 | dog
2 | cat
3 | fish
(3 rows)

test_(postgres)> update t1 set b = 'dogfish' where a =1;
UPDATE 1

test_(postgres)> select * from t1 ;
a | b
---+---------
2 | cat
3 | fish
1 | dogfish
(3 rows)

An UPDATE reorders the rows. Maybe throw an UPDATE into the test after
creating the users to force an 'out of order' result?

Thanks,
Cyril

--
Adrian Klaver
adrian.klaver@aklaver.com

Ian Lawrence Barwick

barwick@gmail.com

over 6 years ago

In reply to: Adrian Klaver (#2)

Re: Default ordering option

On 7/24/19 2:23 AM, Adrian Klaver wrote:

On 7/23/19 8:43 AM, Cyril Champier wrote:

Hi,

In this documentation <https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:

    If sorting is not chosen, the rows will be returned in an
    unspecified order. The actual order in that case will depend on the
    scan and join plan types and the order on disk, but it must not be
    relied on.

I would like to know if there is any way to change that to have a "real" random behaviour.

My use case:
At Doctolib, we do a lot of automatic tests.
Sometimes, people forgot to order their queries. Most of the time, there is no real problem on production. Let say, we display a user list without order.
When a developer writes a test for this feature, he will create 2 users A and B, then assert that they are displayed "[A, B]".
99% of the time the test will be ok, but sometimes, the displayed list will be "[B,A]", and the test will fail.

One solution could be to ensure random order with an even distribution, so that such failing test would be detected quicker.

Is that possible? Maybe with a plugin?

Not that I know of.

A possible solution given below:

test_(postgres)> insert into t1 values (1, 'dog'), (2, 'cat'), (3, 'fish');
INSERT 0 3

test_(postgres)> select * from t1 ;
a | b
---+------
1 | dog
2 | cat
3 | fish
(3 rows)

test_(postgres)> update t1 set b = 'dogfish' where a =1;
UPDATE 1

test_(postgres)> select * from t1 ;
a |    b
---+---------
2 | cat
3 | fish
1 | dogfish
(3 rows)

An UPDATE reorders the rows. Maybe throw an UPDATE into the test after creating the users to force an 'out of order' result?

An UPDATE without changing any values should have the same effect, e.g. :

UPDATE t1 SET b = b WHERE a = 1;

Something like this

WITH x AS (SELECT * FROM t1 ORDER BY a DESC)
UPDATE t1 t
SET a = t.a
FROM x
WHERE t.a = x.a

would shuffle the rows into reverse order, which might be enough to catch
out any missing ORDER BY (this assumes nothing else will touch the table
and reorder it before the test is run).

You could also order by RANDOM() but there would be a chance the rows would
end up in sequential order.

Regards

Ian Barwick

--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Cyril Champier

cyril.champier@doctolib.com

over 6 years ago

In reply to: Ian Lawrence Barwick (#3)

Re: Default ordering option

Thanks for your answers.
Unfortunately the update trick only seems to work under certain conditions.

I do this to shuffle my patients table:
UPDATE "patients"
SET "updated_at" = NOW()
WHERE "patients"."id" = (SELECT "patients"."id" FROM "patients" ORDER BY
random() LIMIT 1)

Then indeed, this query returns different order:
SELECT *
FROM "patients"

But this one (because it use an index?) always returns values in the same
order:
SELECT "id"
FROM "patients"

And for the other suggestion, I cannot blindly add 'ORDER BY random()' to
every select,
because of the incompatibility with distinct and union, and the way we use
our orm.

On Wed, Jul 24, 2019 at 3:54 AM Ian Barwick <ian.barwick@2ndquadrant.com>
wrote:

Show quoted text

On 7/24/19 2:23 AM, Adrian Klaver wrote:

On 7/23/19 8:43 AM, Cyril Champier wrote:

Hi,

In this documentation <

https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:

If sorting is not chosen, the rows will be returned in an
unspecified order. The actual order in that case will depend on the
scan and join plan types and the order on disk, but it must not be
relied on.

I would like to know if there is any way to change that to have a

"real" random behaviour.

My use case:
At Doctolib, we do a lot of automatic tests.
Sometimes, people forgot to order their queries. Most of the time,

there is no real problem on production. Let say, we display a user list
without order.

When a developer writes a test for this feature, he will create 2 users

A and B, then assert that they are displayed "[A, B]".

99% of the time the test will be ok, but sometimes, the displayed list

will be "[B,A]", and the test will fail.

One solution could be to ensure random order with an even distribution,

so that such failing test would be detected quicker.

Is that possible? Maybe with a plugin?

Not that I know of.

A possible solution given below:

test_(postgres)> insert into t1 values (1, 'dog'), (2, 'cat'), (3,

'fish');

INSERT 0 3

test_(postgres)> select * from t1 ;
a | b
---+------
1 | dog
2 | cat
3 | fish
(3 rows)

test_(postgres)> update t1 set b = 'dogfish' where a =1;
UPDATE 1

test_(postgres)> select * from t1 ;
a | b
---+---------
2 | cat
3 | fish
1 | dogfish
(3 rows)

An UPDATE reorders the rows. Maybe throw an UPDATE into the test after

creating the users to force an 'out of order' result?

An UPDATE without changing any values should have the same effect, e.g. :

UPDATE t1 SET b = b WHERE a = 1;

Something like this

WITH x AS (SELECT * FROM t1 ORDER BY a DESC)
UPDATE t1 t
SET a = t.a
FROM x
WHERE t.a = x.a

would shuffle the rows into reverse order, which might be enough to catch
out any missing ORDER BY (this assumes nothing else will touch the table
and reorder it before the test is run).

You could also order by RANDOM() but there would be a chance the rows would
end up in sequential order.

Regards

Ian Barwick

--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Adrian Klaver

adrian.klaver@aklaver.com

over 6 years ago

In reply to: Cyril Champier (#4)

Re: Default ordering option

On 7/24/19 1:45 AM, Cyril Champier wrote:

Thanks for your answers.
Unfortunately the update trick only seems to work under certain conditions.

I do this to shuffle my patients table:
UPDATE "patients"
SET "updated_at" = NOW()
WHERE "patients"."id" = (SELECT "patients"."id" FROM "patients" ORDER BY
random() LIMIT 1)

Then indeed, this query returns different order:
SELECT *
FROM "patients"

But this one (because it use an index?) always returns values in the
same order:
SELECT "id"
FROM "patients"

Hmm, I don't see that:

test=# select * from t1;
a | b
---+---------
2 | cat
3 | fish
1 | dogfish
(3 rows)

test=# select a from t1;
a
---
2
3
1
(3 rows)

Are you sure there is nothing going on between the first and second
queries e.g. ROLLBACK?

And for the other suggestion, I cannot blindly add 'ORDER BY random()'
to every select,
because of the incompatibility with distinct and union, and the way we
use our orm.

Are you talking about the production or test queries above?

--
Adrian Klaver
adrian.klaver@aklaver.com

Cyril Champier

cyril.champier@doctolib.com

over 6 years ago

In reply to: Adrian Klaver (#5)

Re: Default ordering option

Indeed, you are right, I do my test in pure sql and via ruby ActiveRecord,
and I must had been confused,
the behaviour is correct in sql, it must have been a cache thing in
ActiveRecord that prevented the reordering.
But meanwhile, I tested on our whole CI, and it took twice the normal time
with updates to shuffle DB :(

For the union, I speak about production code like this:
"select count(*) from (#{directory_doctors_query_sql} union all
#{profiles_query_sql}) as doctors"
In the to_sql, we cannot detect that we will be injected into a union.
So I cannot blindly add the random in the to_sql method.

On Wed, Jul 24, 2019 at 4:48 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 7/24/19 1:45 AM, Cyril Champier wrote:

Thanks for your answers.
Unfortunately the update trick only seems to work under certain

conditions.

I do this to shuffle my patients table:
UPDATE "patients"
SET "updated_at" = NOW()
WHERE "patients"."id" = (SELECT "patients"."id" FROM "patients" ORDER BY
random() LIMIT 1)

Then indeed, this query returns different order:
SELECT *
FROM "patients"

But this one (because it use an index?) always returns values in the
same order:
SELECT "id"
FROM "patients"

Hmm, I don't see that:

test=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
a | integer | | not null |
b | character varying | | |
Indexes:
"t1_pkey" PRIMARY KEY, btree (a)

test=# select * from t1;
a | b
---+---------
2 | cat
3 | fish
1 | dogfish
(3 rows)

test=# select a from t1;
a
---
2
3
1
(3 rows)

Are you sure there is nothing going on between the first and second
queries e.g. ROLLBACK?

And for the other suggestion, I cannot blindly add 'ORDER BY random()'
to every select,
because of the incompatibility with distinct and union, and the way we
use our orm.

Are you talking about the production or test queries above?

--
Adrian Klaver
adrian.klaver@aklaver.com

Adrian Klaver

adrian.klaver@aklaver.com

over 6 years ago

In reply to: Cyril Champier (#6)

Re: Default ordering option

On 7/24/19 8:22 AM, Cyril Champier wrote:

Indeed, you are right, I do my test in pure sql and via ruby
ActiveRecord, and I must had been confused,
the behaviour is correct in sql, it must have been a cache thing in
ActiveRecord that prevented the reordering.
But meanwhile, I tested on our whole CI, and it took twice the normal
time with updates to shuffle DB :(

Well there extra operations so that is to be expected. The question is
whether everything needs to be shuffled? It would seem you only need to
do that for those tests that are expecting a set order. I went back and
read your original post and am somewhat confused about that test. You said:

"Sometimes, people forgot to order their queries. Most of the time,
there is no real problem on production...."

If order is not an issue in the production code why test for it in the
test code?

For the union, I speak about production code like this:
"select count(*) from (#{directory_doctors_query_sql} union all
#{profiles_query_sql}) as doctors"
In the to_sql, we cannot detect that we will be injected into a union.
So I cannot blindly add the random in the to_sql method.

The below would not be affected by an order by in any case as the
count() would be the same:

"select count(*) from (#{directory_doctors_query_sql} union all
#{profiles_query_sql}) as doctors"

If you did want to use order by random() could you not just tack it on
the end?:

"... as doctors order by random()"

--
Adrian Klaver
adrian.klaver@aklaver.com

Peter Eisentraut

peter_e@gmx.net

over 6 years ago

In reply to: Cyril Champier (#1)

Re: Default ordering option

On 2019-07-23 17:43, Cyril Champier wrote:

In this documentation
<https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:

If sorting is not chosen, the rows will be returned in an
unspecified order. The actual order in that case will depend on the
scan and join plan types and the order on disk, but it must not be
relied on.

I would like to know if there is any way to change that to have a "real"
random behaviour.

It might be an interesting exercise to implement this as a post-parsing
hook.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Cyril Champier

cyril.champier@doctolib.com

over 6 years ago

In reply to: Peter Eisentraut (#8)

Re: Default ordering option

*Adrian*:

If order is not an issue in the production code why test for it in the
test code?

In many cases, it would not be a problem in tests if we had an unordered
array comparison helper.
But in other cases, it is a production issue.
In ruby ActiveRecord for exemple, you can do `Patient.find_by(last_name:
'champier')`,
which translates to `SELECT "patients".* FROM "patients" WHERE
"patients"."last_name" = 'champier' LIMIT 1`.
If last_name is not unique, the returned record will be random.

So yes, everything as to be randomized, because the sources are multiples
and the consequences can vary to a dramatic production bug, a failed CI 1%
of the time, or to a useless test assertion.

*Peter*:

It might be an interesting exercise to implement this as a post-parsing
hook.

I known nothing about that, but that sounds interesting, do you have any
documentation pointer to help me implement that?

On Wed, Jul 24, 2019 at 10:36 PM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

Show quoted text

On 2019-07-23 17:43, Cyril Champier wrote:

In this documentation
<https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:

If sorting is not chosen, the rows will be returned in an
unspecified order. The actual order in that case will depend on the
scan and join plan types and the order on disk, but it must not be
relied on.

I would like to know if there is any way to change that to have a "real"
random behaviour.

It might be an interesting exercise to implement this as a post-parsing
hook.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10

Peter Eisentraut

peter_e@gmx.net

over 6 years ago

In reply to: Cyril Champier (#9)

Re: Default ordering option

On 2019-07-25 09:43, Cyril Champier wrote:

It might be an interesting exercise to implement this as a post-parsing
hook.

I known nothing about that, but that sounds interesting, do you have any
documentation pointer to help me implement that?

Look for post_parse_analyze_hook. Walk the parsed query tree, look for
queries without ordering clause and manufacture one.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

Cyril Champier

cyril.champier@doctolib.com

over 6 years ago

In reply to: Peter Eisentraut (#10)

Re: Default ordering option

Peter:
So I would need to create a pg extension encapsulating this hook callback?
If this is the case, it seems it will be much more complicated than
expected, and I wont be able to do it :(
But thanks for the suggestion anyway.

On Thu, Jul 25, 2019 at 12:21 PM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

Show quoted text

On 2019-07-25 09:43, Cyril Champier wrote:

It might be an interesting exercise to implement this as a

post-parsing

hook.

I known nothing about that, but that sounds interesting, do you have any
documentation pointer to help me implement that?

Look for post_parse_analyze_hook. Walk the parsed query tree, look for
queries without ordering clause and manufacture one.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Adrian Klaver

adrian.klaver@aklaver.com

over 6 years ago

In reply to: Cyril Champier (#9)

Re: Default ordering option

On 7/25/19 12:43 AM, Cyril Champier wrote:

*Adrian*:

If order is not an issue in the production code why test for it in the
test code?

In many cases, it would not be a problem in tests if we had an unordered
array comparison helper.
But in other cases, it is a production issue.
In ruby ActiveRecord for exemple, you can do `Patient.find_by(last_name:
'champier')`,
which translates to `SELECT "patients".* FROM "patients" WHERE
"patients"."last_name" = 'champier' LIMIT 1`.
If last_name is not unique, the returned record will be random.

Are you really looking for a pseudo-random name?

If so would not(warning not a Ruby developer, so below is tentative):

Patient.where(["last_name = :last_name", {last_name:
"champier"}]).order('RANDOM()').first

work better?

If not why not use something that returns all possible matches?

So yes, everything as to be randomized, because the sources are
multiples and the consequences can vary to a dramatic production bug, a
failed CI 1% of the time, or to a useless test assertion.

One way I can think of doing this is write a script that walks through
your tables in the test db and does an UPDATE across the rows. It is
going to add time to your tests, but then I believe that is going to be
the case for anything you do. Or you could look at something I have
never tried, fuzzy testing. As a starting point:

https://www.guru99.com/fuzz-testing.html

Maybe other folks have suggestions on tools you could use for fuzzy
testing.

*Peter*:

It might be an interesting exercise to implement this as a post-parsing
hook.

I known nothing about that, but that sounds interesting, do you have any
documentation pointer to help me implement that?

On Wed, Jul 24, 2019 at 10:36 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:

On 2019-07-23 17:43, Cyril Champier wrote:

In this documentation
<https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:

If sorting is not chosen, the rows will be returned in an
unspecified order. The actual order in that case will depend

on the

scan and join plan types and the order on disk, but it must

not be

relied on.

I would like to know if there is any way to change that to have a

"real"

random behaviour.

It might be an interesting exercise to implement this as a post-parsing
hook.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Adrian Klaver
adrian.klaver@aklaver.com

#13

Cyril Champier

cyril.champier@doctolib.com

over 6 years ago

In reply to: Adrian Klaver (#12)

Re: Default ordering option

Adrian:

Are you really looking for a pseudo-random name?

No, the code I pasted was an existing production bug: the last_name should
have been unique, so the selected patient would always be the same.
This should have been detected in tests, but since the order was "almost
always the same", our test was green 99% of the time, so we discarded it as
flaky.

Fuzzy testing could be an option, but this would go too far, as for Peter
extension suggestion.
We have huge existing codebase with more than 10K tests, and I do not want
to modify our whole testing strategy.

Meanwhile, I went for an ORM patch (ActiveRecord) and forbid usages that
can workaround it.
If you are interested, here is a gist:
https://gist.github.com/cyrilchampier/fdb945e8a09f93d50c7e89305c2f53f0

I wish there was a simple flag to activate in PostgreSQL to do that!

Thanks for your ideas!

On Thu, Jul 25, 2019 at 4:55 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 7/25/19 12:43 AM, Cyril Champier wrote:

*Adrian*:

If order is not an issue in the production code why test for it in

the

test code?

In many cases, it would not be a problem in tests if we had an unordered
array comparison helper.
But in other cases, it is a production issue.
In ruby ActiveRecord for exemple, you can do `Patient.find_by(last_name:
'champier')`,
which translates to `SELECT "patients".* FROM "patients" WHERE
"patients"."last_name" = 'champier' LIMIT 1`.
If last_name is not unique, the returned record will be random.

Are you really looking for a pseudo-random name?

If so would not(warning not a Ruby developer, so below is tentative):

Patient.where(["last_name = :last_name", {last_name:
"champier"}]).order('RANDOM()').first

work better?

If not why not use something that returns all possible matches?

So yes, everything as to be randomized, because the sources are
multiples and the consequences can vary to a dramatic production bug, a
failed CI 1% of the time, or to a useless test assertion.

One way I can think of doing this is write a script that walks through
your tables in the test db and does an UPDATE across the rows. It is
going to add time to your tests, but then I believe that is going to be
the case for anything you do. Or you could look at something I have
never tried, fuzzy testing. As a starting point:

https://www.guru99.com/fuzz-testing.html

Maybe other folks have suggestions on tools you could use for fuzzy
testing.

*Peter*:

It might be an interesting exercise to implement this as a

post-parsing

hook.

I known nothing about that, but that sounds interesting, do you have any
documentation pointer to help me implement that?

On Wed, Jul 24, 2019 at 10:36 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com>> wrote:

On 2019-07-23 17:43, Cyril Champier wrote:

In this documentation
<https://www.postgresql.org/docs/9.1/queries-order.html>, it is

said:

If sorting is not chosen, the rows will be returned in an
unspecified order. The actual order in that case will depend

on the

scan and join plan types and the order on disk, but it must

not be

relied on.

I would like to know if there is any way to change that to have a

"real"

random behaviour.

It might be an interesting exercise to implement this as a

post-parsing

hook.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Adrian Klaver
adrian.klaver@aklaver.com

#14

Julien Rouhaud

rjuju123@gmail.com

over 6 years ago

In reply to: Cyril Champier (#13)

Re: Default ordering option

On Fri, Jul 26, 2019 at 9:53 AM Cyril Champier
<cyril.champier@doctolib.com> wrote:

Adrian:

Are you really looking for a pseudo-random name?

No, the code I pasted was an existing production bug: the last_name should have been unique, so the selected patient would always be the same.
This should have been detected in tests, but since the order was "almost always the same", our test was green 99% of the time, so we discarded it as flaky.

If the filter should return at most 1 row, why put a LIMIT in the
first place? Even with a forced random() you won't get a failure
every time, while asserting there's at most 1 row returned is
guaranteed to fail?

#15

Cyril Champier

cyril.champier@doctolib.com

over 6 years ago

In reply to: Julien Rouhaud (#14)

Re: Default ordering option

Julien,

Because it's production code generated by our ORM for this command:
`Patient.find_by(last_name: 'champier')`.
Of course this was not intended by the developer that though the last_name
was unique.

On Fri, Jul 26, 2019 at 10:10 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Show quoted text

On Fri, Jul 26, 2019 at 9:53 AM Cyril Champier
<cyril.champier@doctolib.com> wrote:

Adrian:

Are you really looking for a pseudo-random name?

No, the code I pasted was an existing production bug: the last_name

should have been unique, so the selected patient would always be the same.

This should have been detected in tests, but since the order was "almost

always the same", our test was green 99% of the time, so we discarded it as
flaky.

If the filter should return at most 1 row, why put a LIMIT in the
first place? Even with a forced random() you won't get a failure
every time, while asserting there's at most 1 row returned is
guaranteed to fail?

#16

Peter J. Holzer

hjp-pgsql@hjp.at

over 6 years ago

In reply to: Cyril Champier (#13)

Re: Default ordering option

On 2019-07-26 09:52:52 +0200, Cyril Champier wrote:

No, the code I pasted was an existing production bug: the last_name
should have been unique, so the selected patient would always be the
same. This should have been detected in tests, but since the order was
"almost always the same", our test was green 99% of the time, so we
discarded it as flaky.

Fuzzy testing could be an option, but this would go too far, as for
Peter extension suggestion. We have huge existing codebase with more
than 10K tests, and I do not want to modify our whole testing
strategy.

Meanwhile, I went for an ORM patch (ActiveRecord) and forbid usages
that can workaround it.

Another idea:

How do ypu prepare your test data? Do you have a (possibly large) test
database or do you populate a test database with test-specific data in a
fixture?

If you do the latter, you might be able insert the data in random order.