selecting random rows

Started by Joseph Shraibmanover 22 years ago7 messagesgeneral
Jump to latest
#1Joseph Shraibman
jks@selectacast.net

Is there a way to get random rows besides ORDER BY random()? The problem with ORDER BY
random() is that is has to get all the rows from the table before the results are returned.

#2Bruce Momjian
bruce@momjian.us
In reply to: Joseph Shraibman (#1)
Re: selecting random rows

Joseph Shraibman wrote:

Is there a way to get random rows besides ORDER BY random()? The problem with ORDER BY
random() is that is has to get all the rows from the table before the results are returned.

Yes, I think one person's idea was to assign a unique value to every
row, then do:

WHERE col > random()
ORDER BY col
LIMIT 1

or something like that.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Dennis Gearon
gearond@fireserve.net
In reply to: Joseph Shraibman (#1)
Re: selecting random rows

If you have a nice small Primary key on the table, you can so something
like this:

SELECT field_list
FROM table
WHERE primary_key IN(
SELECT primary_key,
FROM table
ORDER by RANDOM()
LIMIT your_limit);

This may not be the exact sequence, and there is some workarounds for
some slowness in the IN() keyword. Others may comment at will, and
polish this up?

Joseph Shraibman wrote:

Show quoted text

Is there a way to get random rows besides ORDER BY random()? The
problem with ORDER BY random() is that is has to get all the rows from
the table before the results are returned.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

#4Joseph Shraibman
jks@selectacast.net
In reply to: Dennis Gearon (#3)
Re: selecting random rows

Dennis Gearon wrote:

If you have a nice small Primary key on the table, you can so something
like this:

SELECT field_list
FROM table
WHERE primary_key IN(
SELECT primary_key,
FROM table
ORDER by RANDOM()
LIMIT your_limit);

This may not be the exact sequence, and there is some workarounds for
some slowness in the IN() keyword. Others may comment at will, and
polish this up?

What exactly does that do for me? Postgres still has to go over the whole table to get
the primary keys.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joseph Shraibman (#1)
Re: selecting random rows

Joseph Shraibman <jks@selectacast.net> writes:

Is there a way to get random rows besides ORDER BY random()?

Are you willing to expend an extra column in the table, plus an index on
the column, to make this fast? Then you can do it --- see discussion
just a few days ago.
http://archives.postgresql.org/pgsql-performance/2003-08/msg00526.php

If you are willing to settle for "only approximately random", you might
be able to use the primary key as a pseudo-random value. Again, see
prior discussion.

regards, tom lane

#6Dennis Gearon
gearond@fireserve.net
In reply to: Joseph Shraibman (#4)
Re: selecting random rows

Joseph Shraibman wrote:

Dennis Gearon wrote:

If you have a nice small Primary key on the table, you can so
something like this:

SELECT field_list
FROM table
WHERE primary_key IN(
SELECT primary_key,
FROM table
ORDER by RANDOM()
LIMIT your_limit);

This may not be the exact sequence, and there is some workarounds for
some slowness in the IN() keyword. Others may comment at will, and
polish this up?

What exactly does that do for me? Postgres still has to go over the
whole table to get the primary keys.

But it only caches the keys in the one sub select, NOT the whole row
that you eventually want. I have not idea whether you just want the
primary key, or several fields.

#7scott.marlowe
scott.marlowe@ihs.com
In reply to: Joseph Shraibman (#1)
Re: selecting random rows

On Thu, 11 Sep 2003, Joseph Shraibman wrote:

Is there a way to get random rows besides ORDER BY random()? The problem with ORDER BY
random() is that is has to get all the rows from the table before the results are returned.

If you have a column that is a sequence of numbers with no holes, and you
already know the row count, you can get fairly fast random choices from it
with:

select * from accounts where aid = (select (floor(random()*10000)));

as long as the column has an index.

explain analyze select * from accounts where aid = (select
(floor(random()*10000)));
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on accounts (cost=0.02..3177.02 rows=501 width=100) (actual
time=9.34..390.30 rows=1 loops=1)
Filter: ((aid)::double precision = $0)
InitPlan
-> Result (cost=0.00..0.02 rows=1 width=0) (actual time=0.06..0.06
rows=1 loops=1)
Total runtime: 390.48 msec

But the fastest way is to generate your random number in whatever code you
program your apps in (i.e. rand(0,rowcount-1) and use that number with
limit and offset or above if you have a sequential column with no holes in
it.

Really, it depends on how much you'll be doing it. If it's to randomly
pick a banner ad for a website, then it's worth the extra effort to have
such a sequence in your table. If it's a once a day kinda thing, then
performance probably isn't quite as big of an issue.