Slow query: select * order by XXX desc offset 10 limit 10

Started by Alexander Farberover 14 years ago11 messagesgeneral
Jump to latest
#1Alexander Farber
alexander.farber@gmail.com

Hello,

I use PostgreSQL 8.4.7 on CentOS 6.0 / 64 bit machine
with Quad-Core AMD Opteron(tm) Processor 2352 and
16 GB RAM and use it for 1 PHP script - which selects
and displays data in jQuery DataTables (i.e. an
HTML-table which can be viewed page by page).

I select records from 1 view which unites 2 identical tables:

quincy=> \d quincyview
View "public.quincyview"
Column | Type | Modifiers
-------------+-----------------------------+-----------
qdatetime | timestamp without time zone |
id | character varying(20) |
name | character varying(20) |
category | character varying(120) |
appsversion | character varying(30) |
osversion | character varying(30) |
beta_prog | character varying(20) |
catinfo | character varying(120) |
details | character varying(50) |
devinfo | character varying(4000) |
email | character varying(320) |
emailid | character varying(16) |
imei | character varying(25) |
pin | character varying(12) |
formfactor | character varying(10) |
copied | timestamp without time zone |
View definition:
SELECT quincynoreset.qdatetime, quincynoreset.id,
quincynoreset.name, quincynoreset.category, quincynoreset.appsversion,
quincynoreset.osversion, quincynoreset.beta_prog,
quincynoreset.catinfo, quincynoreset.details, quincynoreset.devinfo,
quincynoreset.email, quincynoreset.emailid, quincynoreset.imei,
quincynoreset.pin, quincynoreset.formfactor, quincynoreset.copied
FROM quincynoreset
UNION
SELECT quincytrack.qdatetime, quincytrack.id,
quincytrack.name, quincytrack.category, quincytrack.appsversion,
quincytrack.osversion, quincytrack.beta_prog, quincytrack.catinfo,
quincytrack.details, quincytrack.devinfo, quincytrack.email,
quincytrack.emailid, quincytrack.imei, quincytrack.pin,
quincytrack.formfactor, quincytrack.copied
FROM quincytrack;

And here is 1 of the 2 tables (the other is same, except its name):

quincy=> \d quincytrack;
Table "public.quincytrack"
Column | Type | Modifiers
-------------+-----------------------------+---------------
appsversion | character varying(30) |
beta_prog | character varying(20) |
category | character varying(120) |
catinfo | character varying(120) |
details | character varying(50) |
devinfo | character varying(4000) |
emailid | character varying(16) |
email | character varying(320) |
formfactor | character varying(10) |
id | character varying(20) | not null
imei | character varying(25) |
name | character varying(20) |
osversion | character varying(30) |
pin | character varying(12) |
qdatetime | timestamp without time zone |
copied | timestamp without time zone | default now()
Indexes:
"quincytrack_pkey" PRIMARY KEY, btree (id)

There are around 1 mio records in the view:

quincy=> select count(*) from quincyview ;
count
--------
950476
(1 row)

My problem is, that select's are very slow and
using my script is no fun despite all the AJAX stuff -
which only tries to retrieve "offset X limit Y" records:

quincy=> explain select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=600344.67..600344.70 rows=10 width=1172)
-> Sort (cost=600344.65..602859.16 rows=1005804 width=1172)
Sort Key: (to_char(quincyview.qdatetime, 'YYYY-MM-DD'::text))
-> Subquery Scan quincyview (cost=518261.35..573580.57
rows=1005804 width=1172)
-> Unique (cost=518261.35..561008.02 rows=1005804 width=252)
-> Sort (cost=518261.35..520775.86 rows=1005804
width=252)
Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
-> Append (cost=0.00..57003.60
rows=1005804 width=252)
-> Seq Scan on quincynoreset
(cost=0.00..40011.20 rows=863394 width=242)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack
(cost=0.00..6934.36 rows=142410 width=312)
Filter: (qdatetime <= now())

Does anybody please have an idea,
how to speed up my select statements?

Regards
Alex

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Alexander Farber (#1)
Re: Slow query: select * order by XXX desc offset 10 limit 10

On Oct 13, 2011, at 9:41, Alexander Farber <alexander.farber@gmail.com> wrote:

Does anybody please have an idea,
how to speed up my select statements?

Create one or more indexes.

David J.

#3Bill Moran
wmoran@potentialtech.com
In reply to: Alexander Farber (#1)
Re: Slow query: select * order by XXX desc offset 10 limit 10

In response to Alexander Farber <alexander.farber@gmail.com>:

Hello,

I use PostgreSQL 8.4.7 on CentOS 6.0 / 64 bit machine
with Quad-Core AMD Opteron(tm) Processor 2352 and
16 GB RAM and use it for 1 PHP script - which selects
and displays data in jQuery DataTables (i.e. an
HTML-table which can be viewed page by page).

I select records from 1 view which unites 2 identical tables:

quincy=> \d quincyview
View "public.quincyview"
Column | Type | Modifiers
-------------+-----------------------------+-----------
qdatetime | timestamp without time zone |
id | character varying(20) |
name | character varying(20) |
category | character varying(120) |
appsversion | character varying(30) |
osversion | character varying(30) |
beta_prog | character varying(20) |
catinfo | character varying(120) |
details | character varying(50) |
devinfo | character varying(4000) |
email | character varying(320) |
emailid | character varying(16) |
imei | character varying(25) |
pin | character varying(12) |
formfactor | character varying(10) |
copied | timestamp without time zone |
View definition:
SELECT quincynoreset.qdatetime, quincynoreset.id,
quincynoreset.name, quincynoreset.category, quincynoreset.appsversion,
quincynoreset.osversion, quincynoreset.beta_prog,
quincynoreset.catinfo, quincynoreset.details, quincynoreset.devinfo,
quincynoreset.email, quincynoreset.emailid, quincynoreset.imei,
quincynoreset.pin, quincynoreset.formfactor, quincynoreset.copied
FROM quincynoreset
UNION
SELECT quincytrack.qdatetime, quincytrack.id,
quincytrack.name, quincytrack.category, quincytrack.appsversion,
quincytrack.osversion, quincytrack.beta_prog, quincytrack.catinfo,
quincytrack.details, quincytrack.devinfo, quincytrack.email,
quincytrack.emailid, quincytrack.imei, quincytrack.pin,
quincytrack.formfactor, quincytrack.copied
FROM quincytrack;

And here is 1 of the 2 tables (the other is same, except its name):

quincy=> \d quincytrack;
Table "public.quincytrack"
Column | Type | Modifiers
-------------+-----------------------------+---------------
appsversion | character varying(30) |
beta_prog | character varying(20) |
category | character varying(120) |
catinfo | character varying(120) |
details | character varying(50) |
devinfo | character varying(4000) |
emailid | character varying(16) |
email | character varying(320) |
formfactor | character varying(10) |
id | character varying(20) | not null
imei | character varying(25) |
name | character varying(20) |
osversion | character varying(30) |
pin | character varying(12) |
qdatetime | timestamp without time zone |
copied | timestamp without time zone | default now()
Indexes:
"quincytrack_pkey" PRIMARY KEY, btree (id)

There are around 1 mio records in the view:

quincy=> select count(*) from quincyview ;
count
--------
950476
(1 row)

My problem is, that select's are very slow and
using my script is no fun despite all the AJAX stuff -
which only tries to retrieve "offset X limit Y" records:

quincy=> explain select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=600344.67..600344.70 rows=10 width=1172)
-> Sort (cost=600344.65..602859.16 rows=1005804 width=1172)
Sort Key: (to_char(quincyview.qdatetime, 'YYYY-MM-DD'::text))
-> Subquery Scan quincyview (cost=518261.35..573580.57
rows=1005804 width=1172)
-> Unique (cost=518261.35..561008.02 rows=1005804 width=252)
-> Sort (cost=518261.35..520775.86 rows=1005804
width=252)
Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
-> Append (cost=0.00..57003.60
rows=1005804 width=252)
-> Seq Scan on quincynoreset
(cost=0.00..40011.20 rows=863394 width=242)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack
(cost=0.00..6934.36 rows=142410 width=312)
Filter: (qdatetime <= now())

Does anybody please have an idea,
how to speed up my select statements?

#1 Add indexes on qdatetime on both tables
#2 don't try to order/filter by a calculated value. Instead modify the
query to order and filter by the raw timestamptz column, which will
allow that to be done without converting it all to text first. This
will require you to change your aliasing in your query.

A possible solution to #2:
select to_char(qdatetime, 'YYYY-MM-DD') as QDATETIMEFORMATTED,
ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO
from quincyview
where qdatetime <= now()
order by QDATETIME desc
offset 10 limit 10;

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

#4Alexander Farber
alexander.farber@gmail.com
In reply to: Bill Moran (#3)
Re: Slow query: select * order by XXX desc offset 10 limit 10

Hello Bill and others,

On Thu, Oct 13, 2011 at 4:09 PM, Bill Moran <wmoran@potentialtech.com> wrote:

In response to Alexander Farber <alexander.farber@gmail.com>:

I use PostgreSQL 8.4.7 on CentOS 6.0 / 64 bit machine
with Quad-Core AMD Opteron(tm) Processor 2352 and
16 GB RAM and use it for 1 PHP script - which selects
and displays data in jQuery DataTables (i.e. an
HTML-table which can be viewed page by page).

My problem is, that select's are very slow and
using my script is no fun despite all the AJAX stuff -
which only tries to retrieve "offset X limit Y" records:

quincy=> explain select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;

      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
 Limit  (cost=600344.67..600344.70 rows=10 width=1172)
   ->  Sort  (cost=600344.65..602859.16 rows=1005804 width=1172)
         Sort Key: (to_char(quincyview.qdatetime, 'YYYY-MM-DD'::text))
         ->  Subquery Scan quincyview  (cost=518261.35..573580.57
rows=1005804 width=1172)
               ->  Unique  (cost=518261.35..561008.02 rows=1005804 width=252)
                     ->  Sort  (cost=518261.35..520775.86 rows=1005804
width=252)
                           Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
                           ->  Append  (cost=0.00..57003.60
rows=1005804 width=252)
                                 ->  Seq Scan on quincynoreset
(cost=0.00..40011.20 rows=863394 width=242)
                                       Filter: (qdatetime <= now())
                                 ->  Seq Scan on quincytrack
(cost=0.00..6934.36 rows=142410 width=312)
                                       Filter: (qdatetime <= now())

#1 Add indexes on qdatetime on both tables
#2 don't try to order/filter by a calculated value.  Instead modify the
  query to order and filter by the raw timestamptz column, which will
  allow that to be done without converting it all to text first.  This
  will require you to change your aliasing in your query.

A possible solution to #2:
select to_char(qdatetime, 'YYYY-MM-DD') as QDATETIMEFORMATTED,
 ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO
from quincyview
where qdatetime <= now()
order by QDATETIME desc
offset 10 limit 10;

I've added 3 new indices on both tables:

quincy=> \d quincynoreset
Table "public.quincynoreset"
Column | Type | Modifiers
-------------+-----------------------------+---------------
appsversion | character varying(30) |
beta_prog | character varying(20) |
category | character varying(120) |
catinfo | character varying(120) |
details | character varying(50) |
devinfo | character varying(4000) |
emailid | character varying(16) |
email | character varying(320) |
formfactor | character varying(10) |
id | character varying(20) | not null
imei | character varying(25) |
name | character varying(20) |
osversion | character varying(30) |
pin | character varying(12) |
qdatetime | timestamp without time zone |
copied | timestamp without time zone | default now()
Indexes:
"quincynoreset_pkey" PRIMARY KEY, btree (id)
"quincynoreset_appsversion_index" btree (appsversion)
"quincynoreset_osversion_index" btree (osversion)
"quincynoreset_qdatetime_index" btree (qdatetime)

And in my query I've renamed the string column to
QDATETIME_2 (if I've got your suggestion #2 correctly) -
still no visible improvement:

quincy=> explain select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=558551.88..558551.91 rows=10 width=1172)
-> Sort (cost=558551.86..560883.79 rows=932773 width=1172)
Sort Key: (to_char(quincyview.qdatetime, 'YYYY-MM-DD'::text))
-> Subquery Scan quincyview (cost=482428.59..533731.10
rows=932773 width=1172)
-> Unique (cost=482428.59..522071.44 rows=932773 width=252)
-> Sort (cost=482428.59..484760.52 rows=932773 width=252)
Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
-> Append (cost=0.00..55177.71
rows=932773 width=252)
-> Seq Scan on quincynoreset
(cost=0.00..39171.89 rows=807446 width=242)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack
(cost=0.00..6678.09 rows=125327 width=315)
Filter: (qdatetime <= now())
(12 rows)

(XXX same query below but with QDATETIME_2 as column name XXX):

quincy=> explain select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME_2,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO
from quincyview where qdatetime <= now() order by QDATETIME desc
offset 10 limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=558551.88..558551.91 rows=10 width=1172)
-> Sort (cost=558551.86..560883.79 rows=932773 width=1172)
Sort Key: quincyview.qdatetime
-> Subquery Scan quincyview (cost=482428.59..533731.10
rows=932773 width=1172)
-> Unique (cost=482428.59..522071.44 rows=932773 width=252)
-> Sort (cost=482428.59..484760.52 rows=932773 width=252)
Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
-> Append (cost=0.00..55177.71
rows=932773 width=252)
-> Seq Scan on quincynoreset
(cost=0.00..39171.89 rows=807446 width=242)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack
(cost=0.00..6678.09 rows=125327 width=315)
Filter: (qdatetime <= now())
(12 rows)

Regards
Alex

#5Alexander Farber
alexander.farber@gmail.com
In reply to: Alexander Farber (#4)
Re: Slow query: select * order by XXX desc offset 10 limit 10

I've also tried opening cursor:

quincy=> open ref for select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc ;
ERROR: syntax error at or near "open"
LINE 1: open ref for select to_char(qdatetime, 'YYYY-MM-DD') as QDAT...
^

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alexander Farber (#5)
Re: Slow query: select * order by XXX desc offset 10 limit 10

Hello

you should to use a DECLARE statement

http://www.postgresql.org/docs/9.1/interactive/sql-declare.html

and fetch statement

http://www.postgresql.org/docs/9.1/interactive/sql-fetch.html

Regards

Pavel Stehule

2011/10/14 Alexander Farber <alexander.farber@gmail.com>:

Show quoted text

I've also tried opening cursor:

quincy=> open ref for select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc ;
ERROR:  syntax error at or near "open"
LINE 1: open ref for select to_char(qdatetime, 'YYYY-MM-DD') as QDAT...
       ^

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Alexander Farber
alexander.farber@gmail.com
In reply to: Pavel Stehule (#6)
Re: Slow query: select * order by XXX desc offset 10 limit 10

Thank you -

On Fri, Oct 14, 2011 at 11:30 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

you should to use a DECLARE statement
http://www.postgresql.org/docs/9.1/interactive/sql-declare.html
and fetch statement
http://www.postgresql.org/docs/9.1/interactive/sql-fetch.html

I've managed to create a cursor
and can fetch the data row by row:

quincy=> start TRANSACTION;
quincy=> declare XXX cursor for select to_char(qdatetime,
'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc ;
quincy=> fetch XXX;
.....
quincy=> fetch XXX;
.....

But how do I "go back"?

For my jQuery HTML table (DataTables.net)
I need to be able to go back and forth.

Regards
Alex

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alexander Farber (#7)
Re: Slow query: select * order by XXX desc offset 10 limit 10

2011/10/14 Alexander Farber <alexander.farber@gmail.com>:

Thank you -

On Fri, Oct 14, 2011 at 11:30 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

you should to use a DECLARE statement
http://www.postgresql.org/docs/9.1/interactive/sql-declare.html
and fetch statement
http://www.postgresql.org/docs/9.1/interactive/sql-fetch.html

I've managed to create a cursor
and can fetch the data row by row:

quincy=> start TRANSACTION;
quincy=> declare XXX cursor for select to_char(qdatetime,
'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc ;
quincy=> fetch XXX;
.....
quincy=> fetch XXX;
.....

But how do I "go back"?

For my jQuery HTML table (DataTables.net)
I need to be able to go back and forth.

Regards
Alex

you can use a scrollable cursors.

BEGIN WORK;

-- Set up a cursor:
DECLARE liahona SCROLL CURSOR FOR SELECT * FROM films;

-- Fetch the first 5 rows in the cursor liahona:
FETCH FORWARD 5 FROM liahona;

code | title | did | date_prod | kind | len
-------+-------------------------+-----+------------+----------+-------
BL101 | The Third Man | 101 | 1949-12-23 | Drama | 01:44
BL102 | The African Queen | 101 | 1951-08-11 | Romantic | 01:43
JL201 | Une Femme est une Femme | 102 | 1961-03-12 | Romantic | 01:25
P_301 | Vertigo | 103 | 1958-11-14 | Action | 02:08
P_302 | Becket | 103 | 1964-02-03 | Drama | 02:28

-- Fetch the previous row:
FETCH PRIOR FROM liahona;

code | title | did | date_prod | kind | len
-------+---------+-----+------------+--------+-------
P_301 | Vertigo | 103 | 1958-11-14 | Action | 02:08

-- Close the cursor and end the transaction:
CLOSE liahona;
COMMIT WORK;

this example is from doc
http://www.postgresql.org/docs/9.1/interactive/sql-fetch.html

Regards

Pavel

Show quoted text

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Alban Hertroys
haramrae@gmail.com
In reply to: Alexander Farber (#4)
Re: Slow query: select * order by XXX desc offset 10 limit 10

On 14 Oct 2011, at 11:14, Alexander Farber wrote:

I've added 3 new indices on both tables:

quincy=> \d quincynoreset
Table "public.quincynoreset"
Column | Type | Modifiers
-------------+-----------------------------+---------------
appsversion | character varying(30) |
beta_prog | character varying(20) |
category | character varying(120) |
catinfo | character varying(120) |
details | character varying(50) |
devinfo | character varying(4000) |
emailid | character varying(16) |
email | character varying(320) |
formfactor | character varying(10) |
id | character varying(20) | not null
imei | character varying(25) |
name | character varying(20) |
osversion | character varying(30) |
pin | character varying(12) |
qdatetime | timestamp without time zone |
copied | timestamp without time zone | default now()
Indexes:
"quincynoreset_pkey" PRIMARY KEY, btree (id)
"quincynoreset_appsversion_index" btree (appsversion)
"quincynoreset_osversion_index" btree (osversion)
"quincynoreset_qdatetime_index" btree (qdatetime)

(...)

quincy=> explain select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME_2,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO
from quincyview where qdatetime <= now() order by QDATETIME desc
offset 10 limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=558551.88..558551.91 rows=10 width=1172)
-> Sort (cost=558551.86..560883.79 rows=932773 width=1172)
Sort Key: quincyview.qdatetime
-> Subquery Scan quincyview (cost=482428.59..533731.10
rows=932773 width=1172)
-> Unique (cost=482428.59..522071.44 rows=932773 width=252)
-> Sort (cost=482428.59..484760.52 rows=932773 width=252)
Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
-> Append (cost=0.00..55177.71
rows=932773 width=252)
-> Seq Scan on quincynoreset
(cost=0.00..39171.89 rows=807446 width=242)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack
(cost=0.00..6678.09 rows=125327 width=315)
Filter: (qdatetime <= now())
(12 rows)

An explain analyse would have been a bit more informative.

Anyway, I think you get the sequential scans because the UNION requires to sort all the data from both tables to guarantee that the results are unique (hence that long Sort Key at the 7th line of explain output).
For that reason, an index on qdatetime alone won't help much, especially when most of your data has qdatetime <= now(), which is probably the case.

It doesn't matter that you only want 10 results from that set, the database will first have to figure out which those rows are. That gets more complicated because they can come from two different tables, due to the UNION.

Do you really need unique results from that view, or are duplicates acceptable (one from each table)? In that case, try UNION ALL instead of UNION.

If you do need unique results, then you could create an index on the combination of all those fields. That should take out the need for those sequential scans.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

#10Alexander Farber
alexander.farber@gmail.com
In reply to: Alban Hertroys (#9)
Re: Slow query: select * order by XXX desc offset 10 limit 10

Hi Alban and others -

On Fri, Oct 14, 2011 at 1:34 PM, Alban Hertroys <haramrae@gmail.com> wrote:

Anyway, I think you get the sequential scans because the UNION requires to sort all the data from both tables to guarantee that the results are unique (hence that long Sort Key at the 7th line of explain output).
For that reason, an index on qdatetime alone won't help much, especially when most of your data has qdatetime <= now(), which is probably the case.

It doesn't matter that you only want 10 results from that set, the database will first have to figure out which those rows are. That gets more complicated because they can come from two different tables, due to the UNION.

Do you really need unique results from that view, or are duplicates acceptable (one from each table)? In that case, try UNION ALL instead of UNION.

I don't need unique at all!

So I've run "explain analyse" on the old view:

quincy=> explain analyse select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=559200.14..559200.16 rows=10 width=1172) (actual
time=11311.537..11311.541 rows=10 loops=1)
-> Sort (cost=559200.11..561534.85 rows=933894 width=1172)
(actual time=11311.532..11311.536 rows=20 loops=1)
Sort Key: (to_char(quincyview.qdatetime, 'YYYY-MM-DD'::text))
Sort Method: top-N heapsort Memory: 27kB
-> Subquery Scan quincyview (cost=482985.36..534349.53
rows=933894 width=1172) (actual time=5778.592..9004.
663 rows=934084 loops=1)
-> Unique (cost=482985.36..522675.85 rows=933894
width=254) (actual time=5777.972..7320.816 rows=9340
84 loops=1)
-> Sort (cost=482985.36..485320.09 rows=933894
width=254) (actual time=5777.969..6557.012 rows=
934084 loops=1)
Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
Sort Method: external merge Disk: 180992kB
-> Append (cost=0.00..55205.73
rows=933894 width=254) (actual time=11.592..2242.501 rows=
934084 loops=1)
-> Seq Scan on quincynoreset
(cost=0.00..39188.71 rows=808567 width=244) (actual ti
me=11.591..1739.695 rows=808647 loops=1)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack
(cost=0.00..6678.09 rows=125327 width=315) (actual time=
6.801..298.642 rows=125437 loops=1)
Filter: (qdatetime <= now())
Total runtime: 11363.393 ms
(15 rows)

Then I'v dropped and recreated the view with "union all":

quincy=> drop view quincyview ;
DROP VIEW
quincy=> create view quincyview as
quincy-> select
quincy-> qdatetime,
quincy-> id,
quincy-> name,
quincy-> category,
quincy-> appsversion,
quincy-> osversion,
quincy-> beta_prog,
quincy-> catinfo,
quincy-> details,
quincy-> devinfo,
quincy-> email,
quincy-> emailid,
quincy-> imei,
quincy-> pin,
quincy-> formfactor,
quincy-> copied
quincy-> from quincynoreset
quincy-> union all
quincy-> select
quincy-> qdatetime,
quincy-> id,
quincy-> name,
quincy-> category,
quincy-> appsversion,
quincy-> osversion,
quincy-> beta_prog,
quincy-> catinfo,
quincy-> details,
quincy-> devinfo,
quincy-> email,
quincy-> emailid,
quincy-> imei,
quincy-> pin,
quincy-> formfactor,
quincy-> copied
quincy-> from quincytrack
quincy-> ;
CREATE VIEW

Then "explain analyse" on the same select query again:

quincy=> explain analyse select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
--------------------------
Limit (cost=73052.13..73052.16 rows=10 width=111) (actual
time=3782.645..3782.649 rows=10 loops=1)
-> Sort (cost=73052.11..75386.84 rows=933894 width=111) (actual
time=3782.640..3782.643 rows=20 loops=1)
Sort Key: (to_char(quincynoreset.qdatetime, 'YYYY-MM-DD'::text))
Sort Method: top-N heapsort Memory: 27kB
-> Result (cost=0.00..48201.53 rows=933894 width=111)
(actual time=0.039..2660.561 rows=934084 loops=1)
-> Append (cost=0.00..45866.79 rows=933894 width=111)
(actual time=0.021..1239.916 rows=934084 loops=
1)
-> Seq Scan on quincynoreset
(cost=0.00..39188.71 rows=808567 width=95) (actual time=0.020..916
.249 rows=808647 loops=1)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack (cost=0.00..6678.09
rows=125327 width=215) (actual time=0.030..125.6
49 rows=125437 loops=1)
Filter: (qdatetime <= now())
Total runtime: 3782.759 ms
(11 rows)

Now the script is noticably more enjoyable, thank you!

Do I still need to add indices over the whole union
and what's the syntax please?

I'm also thinking about adding some "pipelining"
(i.e. prefetching 5-10 pages for the HTML-table):
http://datatables.net/release-datatables/examples/server_side/pipeline.html

Regards
Alex

Show quoted text

If you do need unique results, then you could create an index on the combination of all those fields. That should take out the need for those sequential scans.

#11Alban Hertroys
haramrae@gmail.com
In reply to: Alexander Farber (#10)
Re: Slow query: select * order by XXX desc offset 10 limit 10

On 14 Oct 2011, at 13:58, Alexander Farber wrote:

Hi Alban and others -

On Fri, Oct 14, 2011 at 1:34 PM, Alban Hertroys <haramrae@gmail.com> wrote:

Anyway, I think you get the sequential scans because the UNION requires to sort all the data from both tables to guarantee that the results are unique (hence that long Sort Key at the 7th line of explain output).
For that reason, an index on qdatetime alone won't help much, especially when most of your data has qdatetime <= now(), which is probably the case.

It doesn't matter that you only want 10 results from that set, the database will first have to figure out which those rows are. That gets more complicated because they can come from two different tables, due to the UNION.

Do you really need unique results from that view, or are duplicates acceptable (one from each table)? In that case, try UNION ALL instead of UNION.

I don't need unique at all!

So I've run "explain analyse" on the old view:

quincy=> explain analyse select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------
Limit (cost=559200.14..559200.16 rows=10 width=1172) (actual
time=11311.537..11311.541 rows=10 loops=1)
-> Sort (cost=559200.11..561534.85 rows=933894 width=1172)
(actual time=11311.532..11311.536 rows=20 loops=1)
Sort Key: (to_char(quincyview.qdatetime, 'YYYY-MM-DD'::text))
Sort Method: top-N heapsort Memory: 27kB
-> Subquery Scan quincyview (cost=482985.36..534349.53
rows=933894 width=1172) (actual time=5778.592..9004.
663 rows=934084 loops=1)
-> Unique (cost=482985.36..522675.85 rows=933894
width=254) (actual time=5777.972..7320.816 rows=9340
84 loops=1)
-> Sort (cost=482985.36..485320.09 rows=933894
width=254) (actual time=5777.969..6557.012 rows=
934084 loops=1)
Sort Key: quincynoreset.qdatetime,
quincynoreset.id, quincynoreset.name, quincynoreset.cate
gory, quincynoreset.appsversion, quincynoreset.osversion,
quincynoreset.beta_prog, quincynoreset.catinfo, quincynorese
t.details, quincynoreset.devinfo, quincynoreset.email,
quincynoreset.emailid, quincynoreset.imei, quincynoreset.pin, q
uincynoreset.formfactor, quincynoreset.copied
Sort Method: external merge Disk: 180992kB
-> Append (cost=0.00..55205.73
rows=933894 width=254) (actual time=11.592..2242.501 rows=
934084 loops=1)
-> Seq Scan on quincynoreset
(cost=0.00..39188.71 rows=808567 width=244) (actual ti
me=11.591..1739.695 rows=808647 loops=1)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack
(cost=0.00..6678.09 rows=125327 width=315) (actual time=
6.801..298.642 rows=125437 loops=1)
Filter: (qdatetime <= now())
Total runtime: 11363.393 ms
(15 rows)

You can paste those in http://explain.depesz.com/ to get an analysis of where the most time gets spent in your query. That's often a bit quicker at pointing you to the sore points.

It shows that much of the time in the query is spent on those sorts, and we just eliminated the worst offender ;)

Then I'v dropped and recreated the view with "union all":

Then "explain analyse" on the same select query again:

quincy=> explain analyse select to_char(qdatetime, 'YYYY-MM-DD') as
QDATETIME,ID,NAME,CATEGORY,APPSVERSION,OSVERSION,DETAILS,DEVINFO from
quincyview where qdatetime <= now() order by QDATETIME desc offset 10
limit 10;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
--------------------------
Limit (cost=73052.13..73052.16 rows=10 width=111) (actual
time=3782.645..3782.649 rows=10 loops=1)
-> Sort (cost=73052.11..75386.84 rows=933894 width=111) (actual
time=3782.640..3782.643 rows=20 loops=1)
Sort Key: (to_char(quincynoreset.qdatetime, 'YYYY-MM-DD'::text))
Sort Method: top-N heapsort Memory: 27kB
-> Result (cost=0.00..48201.53 rows=933894 width=111)
(actual time=0.039..2660.561 rows=934084 loops=1)
-> Append (cost=0.00..45866.79 rows=933894 width=111)
(actual time=0.021..1239.916 rows=934084 loops=
1)
-> Seq Scan on quincynoreset
(cost=0.00..39188.71 rows=808567 width=95) (actual time=0.020..916
.249 rows=808647 loops=1)
Filter: (qdatetime <= now())
-> Seq Scan on quincytrack (cost=0.00..6678.09
rows=125327 width=215) (actual time=0.030..125.6
49 rows=125437 loops=1)
Filter: (qdatetime <= now())
Total runtime: 3782.759 ms
(11 rows)

Now the script is noticably more enjoyable, thank you!

This time it's spending a large portion of it's time sorting on that to_char function. As Bill mentioned, the qdatetime in the SELECT list causes that the qdatetime in the ORDER BY uses the "updated definition" from your SELECT list. It doesn't need to do that, the actual timestamp is just as good at that and on that column you have an index!

Do I still need to add indices over the whole union
and what's the syntax please?

Nope, you only needed that because the query was sorting on all those columns.

For the record, an index like that is called a multi-column index and the definition would be:
CREATE INDEX quincynoreset_full_idx ON quincynoreset (qdatetime, id, name, category, appsversion, osversion, beta_prog, catinfo, details, devinfo, email, emailid, imei, pin, formfactor, copied);

(Don't pay too much attention to the order of columns there, I just took the columns and their respective order from the earlier query plan)

I'm also thinking about adding some "pipelining"
(i.e. prefetching 5-10 pages for the HTML-table):
http://datatables.net/release-datatables/examples/server_side/pipeline.html

Firing more selective queries at the database would help as well.

It looks like you're implementing some kind of paging through the result set, but do you really think people will want to page through 100,000 pages of results?

I think they'd be much happier if you would rank the results somehow, so that they get what they're looking for relatively quickly.
Or you could divide up the information - if they have some idea of when the event they're looking for occurred, that helps narrow down the data set a lot and actually increases their chances of finding it.
Those are just a few examples, it much depends on the data and the users you're working with.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.