Using window functions to get the unpaginated count for paginated queries
Hi all,
Recently, during a performance improvement sweep for an application at my
company, one of the hotspots that was discovered was pagination.
In order to display the correct pagination links on the page, the
pagination library we used (most pagination libraries for that matter) ran
the query with OFFSET and LIMIT to get the paginated results, and then
re-ran the query without the OFFSET and LIMIT and wrapped them in a SELECT
COUNT(*) FROM main_query to get the total number of rows.
In an attempt to optimize this, we used a window function as follows:
Given a query that looked as follows:
SELECT a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;
add total_entries_count column as follows:
SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;
This calculates the total number of unpaginated rows correctly, without
affecting the runtime of the query. At least as far as I can tell.
The questions I have are:
1) Are there any adverse effects that the above window function can have?
2) Are there any cases where the count would return incorrectly?
3) In general, is this an appropriate use-case for using window functions?
Thanks,
Clemens
Clemens Park wrote:
Recently, during a performance improvement sweep for an application at
my company, one of the hotspots
that was discovered was pagination.
In order to display the correct pagination links on the page, the
pagination library we used (most
pagination libraries for that matter) ran the query with OFFSET and
LIMIT to get the paginated
results, and then re-ran the query without the OFFSET and LIMIT and
wrapped them in a SELECT COUNT(*)
FROM main_query to get the total number of rows.
In an attempt to optimize this, we used a window function as follows:
Given a query that looked as follows:
SELECT a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;add total_entries_count column as follows:
SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;This calculates the total number of unpaginated rows correctly,
without affecting the runtime of the
query. At least as far as I can tell.
It can affect the runtime considerably.
I created a 10^6 row test table and tried:
test=> EXPLAIN ANALYZE SELECT id, val FROM large OFFSET 100 LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------
Limit (cost=1.49..1.64 rows=10 width=12) (actual time=0.177..0.195
rows=10 loops=1)
-> Seq Scan on large (cost=0.00..14902.00 rows=1000000 width=12)
(actual time=0.028..0.114 rows=110 loops=1)
Total runtime: 0.251 ms
(3 rows)
test=> EXPLAIN ANALYZE SELECT id, val, COUNT(*) OVER () AS
total_entries_count FROM large OFFSET 100 LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------
Limit (cost=2.74..3.01 rows=10 width=12) (actual
time=1893.606..1893.625 rows=10 loops=1)
-> WindowAgg (cost=0.00..27402.00 rows=1000000 width=12) (actual
time=1893.435..1893.559 rows=110 loops=1)
-> Seq Scan on large (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.025..647.182 rows=1000000 loops=1)
Total runtime: 1915.255 ms
(4 rows)
That is because the second query will have to scan all rows, while the
first one can stop scanning after 110 rows.
The questions I have are:
1) Are there any adverse effects that the above window function can
have?
I can only think of the performance degradation mentioned above.
2) Are there any cases where the count would return incorrectly?
No.
3) In general, is this an appropriate use-case for using window
functions?
I think it is.
Maybe you can do better if you don't retrieve the total count
of rows for every set of rows you select.
Yours,
Laurenz Albe
1) Are there any adverse effects that the above window function can have?
It can cause severe performance degradation, as mentioned before.
2) Are there any cases where the count would return incorrectly?
It could return incorrect result if there are some rows with table.id =
NULL . count(table_field) returns the number of rows, where table_field is
not NULL.
3) In general, is this an appropriate use-case for using window functions?
It does the job => it is an appropriate use-case for using window functions.
If this query causes performance degradation and you do not need the exact
count of rows, it is better to use something from
http://wiki.postgresql.org/wiki/Count_estimate or google for "fast
postgresql count".
Thanks for the reply everyone.
In my case, it looks like there is no real drawback then, since what used
to happen is:
SELECT a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;
followed by:
SELECT COUNT(*) FROM (
SELECT a,b,c
FROM table
WHERE clauses
);
(notice the lack of OFFSET and LIMIT)
and both of them were replaced with:
SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;
On Wed, Nov 14, 2012 at 5:11 AM, Igor Romanchenko <
igor.a.romanchenko@gmail.com> wrote:
Show quoted text
1) Are there any adverse effects that the above window function can have?
It can cause severe performance degradation, as mentioned before.
2) Are there any cases where the count would return incorrectly?
It could return incorrect result if there are some rows with table.id =
NULL . count(table_field) returns the number of rows, where table_field is
not NULL.3) In general, is this an appropriate use-case for using window functions?
It does the job => it is an appropriate use-case for using window
functions.If this query causes performance degradation and you do not need the
exact count of rows, it is better to use something from
http://wiki.postgresql.org/wiki/Count_estimate or google for "fast
postgresql count".