mal advice in FAQ 4.1.

Started by Pavel Stehuleover 18 years ago7 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

I found lot of slow queries in some databases which I checked based on
advice 4.1. from FAQ,

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

It's robust and slow on bigger tables. Can we add some better solutions?

Regards
Pavel Stehule

#2Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Pavel Stehule (#1)
Re: mal advice in FAQ 4.1.

Hubert recently posted his thoughts on this topic:
http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/

I've encountered with this problem several times in web development and
every time found out that the best (in terms of performance) solution is to
use some pseudo random approach (such as ">= random() limit 1" or "limit 1
offset random()*N" or even pre-caching rows on app side).

On 10/9/07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

I found lot of slow queries in some databases which I checked based on
advice 4.1. from FAQ,

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

It's robust and slow on bigger tables. Can we add some better solutions?

--
Best regards,
Nikolay

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Nikolay Samokhvalov (#2)
Re: mal advice in FAQ 4.1.

2007/10/9, Nikolay Samokhvalov <samokhvalov@gmail.com>:

Hubert recently posted his thoughts on this topic:
http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/

I've encountered with this problem several times in web development and
every time found out that the best (in terms of performance) solution is to
use some pseudo random approach (such as ">= random() limit 1" or "limit 1
offset random()*N" or even pre-caching rows on app side).

I know this article, but you cannot link from faq to private
(unstable) blog. it would article on techdoc.postgresql.org

Pavel

Show quoted text

On 10/9/07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

I found lot of slow queries in some databases which I checked based on
advice 4.1. from FAQ,

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

It's robust and slow on bigger tables. Can we add some better solutions?

--
Best regards,
Nikolay

#4Gregory Stark
stark@enterprisedb.com
In reply to: Nikolay Samokhvalov (#2)
Re: mal advice in FAQ 4.1.

"Nikolay Samokhvalov" <samokhvalov@gmail.com> writes:

Hubert recently posted his thoughts on this topic:
http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/

I've encountered with this problem several times in web development and
every time found out that the best (in terms of performance) solution is to
use some pseudo random approach (such as ">= random() limit 1" or "limit 1
offset random()*N" or even pre-caching rows on app side).

"ORDER BY random() LIMIT 1" should be faster in 8.3 due to the bounded-sort
optimization. It should be basically the same as the two options above as far
as how many comparisons are done and how much memory is used. It does have to
call random() for every record whereas the solutions above only call random()
once.

But I think all of these are basically the same to a first degree
approximation. They all have to do a scan of all the records being considered.
If you want something faster you need a solution which can use an index to
scan only the target record. There are ways of doing that but they require
some application knowledge.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Gregory Stark (#4)
Re: mal advice in FAQ 4.1.

2007/10/9, Gregory Stark <stark@enterprisedb.com>:

"Nikolay Samokhvalov" <samokhvalov@gmail.com> writes:

Hubert recently posted his thoughts on this topic:
http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/

I've encountered with this problem several times in web development and
every time found out that the best (in terms of performance) solution is to
use some pseudo random approach (such as ">= random() limit 1" or "limit 1
offset random()*N" or even pre-caching rows on app side).

"ORDER BY random() LIMIT 1" should be faster in 8.3 due to the bounded-sort
optimization. It should be basically the same as the two options above as far
as how many comparisons are done and how much memory is used. It does have to
call random() for every record whereas the solutions above only call random()
once.

But I think all of these are basically the same to a first degree
approximation. They all have to do a scan of all the records being considered.
If you want something faster you need a solution which can use an index to
scan only the target record. There are ways of doing that but they require
some application knowledge.

It needs always seq scan :(, and take space on buffer cache. Solution
based on random generated PK are much faster. I collaborate with one
my customer. He shows random products from 10K products on every page
of one eShop. And he cannot understand, so ORDER random() LIMIT is bad
trick, because this trick is on PostgreSQL FAQ.

Pavel

#6Martijn van Oosterhout
kleptog@svana.org
In reply to: Pavel Stehule (#5)
Re: mal advice in FAQ 4.1.

On Tue, Oct 09, 2007 at 06:40:23PM +0200, Pavel Stehule wrote:

It needs always seq scan :(, and take space on buffer cache. Solution
based on random generated PK are much faster. I collaborate with one
my customer. He shows random products from 10K products on every page
of one eShop. And he cannot understand, so ORDER random() LIMIT is bad
trick, because this trick is on PostgreSQL FAQ.

It's the only trick that works in all situations though. There are
ofcourse faster methods, but they require information about the
distribution of the values, the type, PKs, indexes etc...

The standard does have stuff relating to extracting samples from
tables, but they're not implemented.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Martijn van Oosterhout (#6)
Re: mal advice in FAQ 4.1.

2007/10/9, Martijn van Oosterhout <kleptog@svana.org>:

On Tue, Oct 09, 2007 at 06:40:23PM +0200, Pavel Stehule wrote:

It needs always seq scan :(, and take space on buffer cache. Solution
based on random generated PK are much faster. I collaborate with one
my customer. He shows random products from 10K products on every page
of one eShop. And he cannot understand, so ORDER random() LIMIT is bad
trick, because this trick is on PostgreSQL FAQ.

It's the only trick that works in all situations though. There are
ofcourse faster methods, but they require information about the
distribution of the values, the type, PKs, indexes etc...

The standard does have stuff relating to extracting samples from
tables, but they're not implemented.

I agree. I don't wont to remove it from FAQ. I would to add note, so
sometimes is necessary to find other trick.

Regards
Pavel