Pagination count strategies

Started by Leonardo M. Raméabout 12 years ago9 messagesgeneral
Jump to latest
#1Leonardo M. Ramé
l.rame@griensu.com

Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

Regards,
--
Leonardo M. Ram�
Medical IT - Griensu S.A.
Av. Col�n 636 - Piso 8 Of. A
X5000EPT -- C�rdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Leonardo M. Ramé (#1)
Re: Pagination count strategies

On Thu, Apr 03, 2014 at 10:34:32AM -0300, Leonardo M. Ramé wrote:

What strategy for showing the total number of records returned do you
recommend?.

The best answer for this I've ever seen is to limit the number of rows
you're counting (at least at first) to some reasonably small number --
say 5000. This is usually reasonably fast for a well-indexed query,
and your pagination can say something like "First n of at least 5000
results", unless you have fewer than 5000 results, in which case you
know the number (and the count returned quickly anyway). As you're
displaying those first 5000 results, you can work in the background
getting a more accurate number. This is more work for your
application, but it provides a much better user experience (and you
can delay getting the detailed number until the user pages through to
the second page of results, so you don't count everything needlessly
in case the user just uses the first page, which IME happens a lot).
Note that even Google doesn't give you an accurate number -- they just
say "about ten trillion" or whatever.

Hope that's useful,

A

--
Andrew Sullivan
ajs@crankycanuck.ca

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Dorian Hoxha
dorian.hoxha@gmail.com
In reply to: Leonardo M. Ramé (#1)
Re: Pagination count strategies

Cache the total ?

On Thu, Apr 3, 2014 at 3:34 PM, Leonardo M. Ramé <l.rame@griensu.com> wrote:

Show quoted text

Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Alban Hertroys
haramrae@gmail.com
In reply to: Leonardo M. Ramé (#1)
Re: Pagination count strategies

On 3 April 2014 15:34, Leonardo M. Ramé <l.rame@griensu.com> wrote:

Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

Assuming your results are unique (and what would be the point of
showing duplicate results?) in a specific order, it should be possible
to (uniquely) identify the last record shown on a previous page and
display n results from there on.

To add a result counter for displaying purposes to that, since you
need to remember the last displayed result anyway, you might as well
store the relative record number of that result with it and count from
there on.

This works well if you only have a next/previous results link, but not
so well when people can pick arbitrary page numbers. It can work
relative to the current page regardless of which page the user chose
to navigate to next, but you'll have to go through all the results in
between that page and the current page... That should still be faster
than always counting from the start though (and you can be smart about
from which end you start counting).

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Torsten Förtsch
torsten.foertsch@gmx.net
In reply to: Leonardo M. Ramé (#1)
Re: Pagination count strategies

On 03/04/14 15:34, Leonardo M. Ram� wrote:

Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

If you need only an estimated number and if your planner statistics are
up to date, you can use the planner.

Here is my implementation of the explain function. The COMMENT below
shows how to use it:

CREATE OR REPLACE FUNCTION explain(VARIADIC TEXT[])
RETURNS JSON AS $$
DECLARE
tmp TEXT;
BEGIN
EXECUTE 'EXPLAIN ('
|| array_to_string(array_append($1[2:array_upper($1, 1)],
'FORMAT JSON'), ', ')
|| ') '
|| $1[1] INTO tmp;
RETURN tmp::JSON;
END;
$$ LANGUAGE plpgsql VOLATILE;

COMMENT ON FUNCTION explain(VARIADIC TEXT[])
IS $def$
This function is a SQL interface to the planner. It returns the plan
(result of EXPLAIN) of the query passed as TEXT string as the first
parameter as JSON object.

The remaining parameters are EXPLAIN-modifiers, like ANALYZE or
BUFFERS.

The function can be used to store plans in the database.

Another interesting usage is when you need only an estimated row
count for a query. You can use

SELECT count(*) ...

This gives you an exact number but is usually slow. If your planner
statistics are up to date and the query is not too complicated, the
planner usually gives a good estimate and is much faster.

SELECT explain('SELECT 1 FROM tb WHERE id>80000000')
->0->'Plan'->'Plan Rows';
$def$;

Torsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Leonardo M. Ramé
l.rame@griensu.com
In reply to: Andrew Sullivan (#2)
Re: Pagination count strategies

On 2014-04-03 10:00:18 -0400, Andrew Sullivan wrote:

On Thu, Apr 03, 2014 at 10:34:32AM -0300, Leonardo M. Ram� wrote:

What strategy for showing the total number of records returned do you
recommend?.

The best answer for this I've ever seen is to limit the number of rows
you're counting (at least at first) to some reasonably small number --
say 5000. This is usually reasonably fast for a well-indexed query,
and your pagination can say something like "First n of at least 5000
results", unless you have fewer than 5000 results, in which case you
know the number (and the count returned quickly anyway). As you're
displaying those first 5000 results, you can work in the background
getting a more accurate number. This is more work for your
application, but it provides a much better user experience (and you
can delay getting the detailed number until the user pages through to
the second page of results, so you don't count everything needlessly
in case the user just uses the first page, which IME happens a lot).
Note that even Google doesn't give you an accurate number -- they just
say "about ten trillion" or whatever.

Hope that's useful,

A

Sounds nice, is it possible to modify my "count(*) over()" to what you
suggest?.

--
Leonardo M. Ram�
Medical IT - Griensu S.A.
Av. Col�n 636 - Piso 8 Of. A
X5000EPT -- C�rdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Leonardo M. Ramé
l.rame@griensu.com
In reply to: Torsten Förtsch (#5)
Re: Pagination count strategies

On 2014-04-03 17:19:56 +0200, Torsten F�rtsch wrote:

On 03/04/14 15:34, Leonardo M. Ram� wrote:

Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

If you need only an estimated number and if your planner statistics are
up to date, you can use the planner.

Here is my implementation of the explain function. The COMMENT below
shows how to use it:

CREATE OR REPLACE FUNCTION explain(VARIADIC TEXT[])
RETURNS JSON AS $$
DECLARE
tmp TEXT;
BEGIN
EXECUTE 'EXPLAIN ('
|| array_to_string(array_append($1[2:array_upper($1, 1)],
'FORMAT JSON'), ', ')
|| ') '
|| $1[1] INTO tmp;
RETURN tmp::JSON;
END;
$$ LANGUAGE plpgsql VOLATILE;

COMMENT ON FUNCTION explain(VARIADIC TEXT[])
IS $def$
This function is a SQL interface to the planner. It returns the plan
(result of EXPLAIN) of the query passed as TEXT string as the first
parameter as JSON object.

The remaining parameters are EXPLAIN-modifiers, like ANALYZE or
BUFFERS.

The function can be used to store plans in the database.

Another interesting usage is when you need only an estimated row
count for a query. You can use

SELECT count(*) ...

This gives you an exact number but is usually slow. If your planner
statistics are up to date and the query is not too complicated, the
planner usually gives a good estimate and is much faster.

SELECT explain('SELECT 1 FROM tb WHERE id>80000000')
->0->'Plan'->'Plan Rows';
$def$;

Torsten

Nice!, do you know if this will work on 8.4?.

--
Leonardo M. Ram�
Medical IT - Griensu S.A.
Av. Col�n 636 - Piso 8 Of. A
X5000EPT -- C�rdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Leonardo M. Ramé (#6)
Re: Pagination count strategies

On Thu, Apr 03, 2014 at 12:44:23PM -0300, Leonardo M. Ramé wrote:

Sounds nice, is it possible to modify my "count(*) over()" to what you
suggest?.

I think the window_definition inside over() can contain a LIMIT, can't
it? I didn't check just now, but I can't think any reason why not.
(ISTR when I did this in the past we didn't have window functions, so
I simulated it another way.)

A

--
Andrew Sullivan
ajs@crankycanuck.ca

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Torsten Förtsch
torsten.foertsch@gmx.net
In reply to: Leonardo M. Ramé (#7)
Re: Pagination count strategies

On 03/04/14 17:44, Leonardo M. Ram� wrote:

Nice!, do you know if this will work on 8.4?.

no way

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general