Large tables, ORDER BY and sequence/index scans

Started by Milan Zamazalover 16 years ago17 messagesgeneral
Jump to latest
#1Milan Zamazal
pdm@brailcom.org

My problem is that retrieving sorted data from large tables is sometimes
very slow in PostgreSQL (8.4.1, FWIW).

I typically retrieve the data using cursors, to display them in UI:

BEGIN;
DECLARE ... SELECT ... ORDER BY ...;
FETCH ...;
...

On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command. This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column. When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.

PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation. Nevertheless such behavior leads
to unacceptably poor performance in my particular case. It is important
to get first resulting rows quickly, to display them to the user without
delay.

My questions are:

- What is your experience with using ORDER BY + indexes on large tables?

- Is there a way to convince PostgreSQL to use index scans automatically
in cases where it is much more efficient? I tried using ANALYZE,
VACUUM and SET STATISTICS, but without success.

- Is it a good idea to set enable_seqscan or enable_sort to "off"
globally in my case? Or to set them to "off" just before working with
large tables? My databases contain short and long tables, often
connected through REFERENCES or joined into views and many of shorter
tables serve as codebooks. Can setting one of the parameters to off
have clearly negative impacts?

- Is there a recommended way to keep indexes in good shape so that the
performance of initial rows retrievals remains good? The large tables
are typically append-only tables with a SERIAL primary key.

Thanks for any tips.

#2Filip Rembiałkowski
plk.zuber@gmail.com
In reply to: Milan Zamazal (#1)
Re: Large tables, ORDER BY and sequence/index scans

2010/1/5 Milan Zamazal <pdm@brailcom.org>

My problem is that retrieving sorted data from large tables is sometimes
very slow in PostgreSQL (8.4.1, FWIW).

I typically retrieve the data using cursors, to display them in UI:

BEGIN;
DECLARE ... SELECT ... ORDER BY ...;
FETCH ...;
...

On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command. This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column. When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.

PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation. Nevertheless such behavior leads
to unacceptably poor performance in my particular case. It is important
to get first resulting rows quickly, to display them to the user without
delay.

My questions are:

- What is your experience with using ORDER BY + indexes on large tables?

Without a WHERE clause postgres will almost always choose a sequential scan.

- Is there a way to convince PostgreSQL to use index scans automatically
in cases where it is much more efficient? I tried using ANALYZE,
VACUUM and SET STATISTICS, but without success.

By using cursors you take some responsibility away from the planner.
It has no idea that you want first 100 rows quickly. It just tries to
optimize the whole operation.

- Is it a good idea to set enable_seqscan or enable_sort to "off"
globally in my case? Or to set them to "off" just before working with
large tables? My databases contain short and long tables, often
connected through REFERENCES or joined into views and many of shorter
tables serve as codebooks. Can setting one of the parameters to off
have clearly negative impacts?

IMHO, no, no and yes.

- Is there a recommended way to keep indexes in good shape so that the
performance of initial rows retrievals remains good? The large tables
are typically append-only tables with a SERIAL primary key.

Use partitioning.
If that's not possible, REINDEX periodically to avoid sub-optimal btree
layout. But that's just a half-solution.

Thanks for any tips.

tips:

1. get rid of cursors, unless you have a strong need for them (eg. seeking
back and forth and updating).

2. switch to "chunked" processing, like this:

SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
(process the records)
SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch]
ORDER by idxcol LIMIT 1000;
... and so on.

pozdrawiam,
Filip

--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Milan Zamazal (#1)
Re: Large tables, ORDER BY and sequence/index scans

Hello

please, send explain result

postgres=# explain analyze declare x cursor for select * from foo;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4) (actual
time=0.010..0.012 rows=2 loops=1)
Total runtime: 0.033 ms
(2 rows)

regards
Pavel Stehule

2010/1/5 Milan Zamazal <pdm@brailcom.org>:

Show quoted text

My problem is that retrieving sorted data from large tables is sometimes
very slow in PostgreSQL (8.4.1, FWIW).

I typically retrieve the data using cursors, to display them in UI:

 BEGIN;
 DECLARE ... SELECT ... ORDER BY ...;
 FETCH ...;
 ...

On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command.  This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column.  When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.

PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation.  Nevertheless such behavior leads
to unacceptably poor performance in my particular case.  It is important
to get first resulting rows quickly, to display them to the user without
delay.

My questions are:

- What is your experience with using ORDER BY + indexes on large tables?

- Is there a way to convince PostgreSQL to use index scans automatically
 in cases where it is much more efficient?  I tried using ANALYZE,
 VACUUM and SET STATISTICS, but without success.

- Is it a good idea to set enable_seqscan or enable_sort to "off"
 globally in my case?  Or to set them to "off" just before working with
 large tables?  My databases contain short and long tables, often
 connected through REFERENCES or joined into views and many of shorter
 tables serve as codebooks.  Can setting one of the parameters to off
 have clearly negative impacts?

- Is there a recommended way to keep indexes in good shape so that the
 performance of initial rows retrievals remains good?  The large tables
 are typically append-only tables with a SERIAL primary key.

Thanks for any tips.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Milan Zamazal
pdm@brailcom.org
In reply to: Filip Rembiałkowski (#2)
Re: Large tables, ORDER BY and sequence/index scans

"FR" == Filip Rembiałkowski <plk.zuber@gmail.com> writes:

FR> 2010/1/5 Milan Zamazal <pdm@brailcom.org>

- Is it a good idea to set enable_seqscan or enable_sort to "off"
globally in my case? Or to set them to "off" just before working
with large tables? My databases contain short and long tables,
often connected through REFERENCES or joined into views and many
of shorter tables serve as codebooks. Can setting one of the
parameters to off have clearly negative impacts?

FR> IMHO, no, no and yes.

Why (especially the "yes" part)? Any details and/or pointers?

FR> 1. get rid of cursors, unless you have a strong need for them
FR> (eg. seeking back and forth and updating).

Cursors are very convenient for me, because they allow easy browsing
data in the user interface (fetching limited sets of rows while seeking
forward and backward) and they prevent contingent seeking and other
troubles when concurrent updates happen.

FR> 2. switch to "chunked" processing, like this:

FR> SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
FR> (process the records)
FR> SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch]
FR> ORDER by idxcol LIMIT 1000;
FR> ... and so on.

Not counting the convenience of cursors, this wouldn't work as the
values in idxcol needn't be unique.

Thanks,
Milan Zamazal

#5Milan Zamazal
pdm@brailcom.org
In reply to: Pavel Stehule (#3)
Re: Large tables, ORDER BY and sequence/index scans

"PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

PS> please, send explain result

For ~ 10 million rows table:

explain declare c cursor for select * from foo2 order by value;
QUERY PLAN
----------------------------------------------------------------------
Sort (cost=1829429.20..1854429.20 rows=9999999 width=10)
Sort Key: value
-> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10)
(3 rows)

For ~ 1 million rows table, containing the first million rows from foo2
(`value' column contains random integer data):

explain declare c cursor for select * from foo order by value;
QUERY PLAN
-----------------------------------------------------------------------------------
Index Scan using foo_value_idx on foo (cost=0.00..47604.02 rows=999999 width=10)
(1 row)

When seqscan is disabled for the 10 million rows table:

set enable_seqscan = off;
explain declare c cursor for select * from foo2 order by value;
QUERY PLAN
-----------------------------------------------------------------------------------------
Index Scan using foo2_value_idx on foo2 (cost=0.00..32220140.73 rows=9999999 width=10)
(1 row)

Regards,
Milan Zamazal

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Milan Zamazal (#1)
Re: Large tables, ORDER BY and sequence/index scans

Milan Zamazal wrote:

My problem is that retrieving sorted data from large tables
is sometimes
very slow in PostgreSQL (8.4.1, FWIW).

I typically retrieve the data using cursors, to display them in UI:

BEGIN;
DECLARE ... SELECT ... ORDER BY ...;
FETCH ...;
...

On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command. This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column. When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.

PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation. Nevertheless such behavior leads
to unacceptably poor performance in my particular case. It is important
to get first resulting rows quickly, to display them to the user without
delay.

Did you try to reduce the cursor_tuple_fraction parameter?

Yours,
Laurenz Albe

#7Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Milan Zamazal (#4)
Re: Large tables, ORDER BY and sequence/index scans

2010/1/5 Milan Zamazal <pdm@brailcom.org>:

Cursors are very convenient for me, because they allow easy browsing
data in the user interface (fetching limited sets of rows while seeking
forward and backward) and they prevent contingent seeking and other
troubles when concurrent updates happen.

Sounds to me like a borked app design. Do you seriously need to walk
the user through couple of million rows of data ?

I mean, databases are designed to work on data, and give you the
result back. Use this capability intelligently.

--
GJ

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Milan Zamazal (#5)
Re: Large tables, ORDER BY and sequence/index scans

please EXPLAIN ANALYZE

Pavel

2010/1/5 Milan Zamazal <pdm@brailcom.org>:

Show quoted text

"PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

   PS> please, send explain result

For ~ 10 million rows table:

 explain declare c cursor for select * from foo2 order by value;
                               QUERY PLAN
 ----------------------------------------------------------------------
  Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10)
    Sort Key: value
    ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10)
 (3 rows)

For ~ 1 million rows table, containing the first million rows from foo2
(`value' column contains random integer data):

 explain declare c cursor for select * from foo order by value;
                                     QUERY PLAN
 -----------------------------------------------------------------------------------
  Index Scan using foo_value_idx on foo  (cost=0.00..47604.02 rows=999999 width=10)
 (1 row)

When seqscan is disabled for the 10 million rows table:

 set enable_seqscan = off;
 explain declare c cursor for select * from foo2 order by value;
                                        QUERY PLAN
 -----------------------------------------------------------------------------------------
  Index Scan using foo2_value_idx on foo2  (cost=0.00..32220140.73 rows=9999999 width=10)
 (1 row)

Regards,
Milan Zamazal

#9Milan Zamazal
pdm@brailcom.org
In reply to: Pavel Stehule (#8)
Re: Large tables, ORDER BY and sequence/index scans

"PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

PS> please EXPLAIN ANALYZE Pavel

I see, I'm sorry. Here are the results:

set enable_seqscan = on;
explain analyze declare c cursor for select * from foo2 order by value;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43432.727..49303.902 rows=9999999 loops=1)
Sort Key: value
Sort Method: external merge Disk: 204208kB
-> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.058..1775.928 rows=9999999 loops=1)
Total runtime: 53693.012 ms
(5 rows)

set enable_seqscan = off;
explain analyze declare c cursor for select * from foo2 order by value;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using foo2_value_idx on foo2 (cost=0.00..32220140.73 rows=9999999 width=10) (actual time=0.225..55055.583 rows=9999999 loops=1)
Total runtime: 56394.283 ms
(2 rows)

#10Milan Zamazal
pdm@brailcom.org
In reply to: Laurenz Albe (#6)
Re: Large tables, ORDER BY and sequence/index scans

"AL" == Albe Laurenz <laurenz.albe@wien.gv.at> writes:

AL> Did you try to reduce the cursor_tuple_fraction parameter?

No, good idea, thanks. It helps.

The question is whether it's a good idea to reduce cursor_tuple_fraction
universally, without knowing the table size before (and I'm not going to
use SELECT COUNT(*) for well known reasons). But I can probably
experiment a bit and will see, it looks promising.

#11Milan Zamazal
pdm@brailcom.org
In reply to: Grzegorz Jaśkiewicz (#7)
Re: Large tables, ORDER BY and sequence/index scans

"GJ" == Grzegorz Jaśkiewicz <gryzman@gmail.com> writes:

GJ> Do you seriously need to walk the user through couple of million
GJ> rows of data ?

Typically not. Data can be of any size. Some tables may be large and
I'd like to understand what happens. It is a general data browser.

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Milan Zamazal (#9)
Re: Large tables, ORDER BY and sequence/index scans

2010/1/5 Milan Zamazal <pdm@brailcom.org>:

"PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

   PS> please EXPLAIN ANALYZE Pavel

I see, I'm sorry.  Here are the results:

Have you original values random_page_cost and seq_page_cost in postgres.conf?

it is strange.

Pavel

Show quoted text

 set enable_seqscan = on;
 explain analyze declare c cursor for select * from foo2 order by value;
                                                        QUERY PLAN
 -------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43432.727..49303.902 rows=9999999 loops=1)
    Sort Key: value
    Sort Method:  external merge  Disk: 204208kB
    ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.058..1775.928 rows=9999999 loops=1)
  Total runtime: 53693.012 ms
 (5 rows)

 set enable_seqscan = off;
 explain analyze declare c cursor for select * from foo2 order by value;

                                                                  QUERY PLAN
 ---------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using foo2_value_idx on foo2  (cost=0.00..32220140.73 rows=9999999 width=10) (actual time=0.225..55055.583 rows=9999999 loops=1)
  Total runtime: 56394.283 ms
 (2 rows)

#13Milan Zamazal
pdm@brailcom.org
In reply to: Pavel Stehule (#12)
Re: Large tables, ORDER BY and sequence/index scans

"PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

PS> Have you original values random_page_cost and seq_page_cost in
PS> postgres.conf?

Yes. To be sure I uncommented the values in postgresql.conf

seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 4.0 # same scale as above

and restarted PostgreSQL. The result looks basically the same:

explain analyze declare c cursor for select * from foo2 order by value;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43709.313..49265.244 rows=9999999 loops=1)
Sort Key: value
Sort Method: external merge Disk: 204208kB
-> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.072..1760.585 rows=9999999 loops=1)
Total runtime: 54399.967 ms
(5 rows)

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Milan Zamazal (#13)
Re: Large tables, ORDER BY and sequence/index scans

2010/1/5 Milan Zamazal <pdm@brailcom.org>:

"PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

   PS> Have you original values random_page_cost and seq_page_cost in
   PS> postgres.conf?

Yes.  To be sure I uncommented the values in postgresql.conf

 seq_page_cost = 1.0                   # measured on an arbitrary scale
 random_page_cost = 4.0                # same scale as above

and value efective_cache_size ???

what is CREATE INDEX stament for index?

Pavel

Show quoted text

and restarted PostgreSQL.  The result looks basically the same:

 explain analyze declare c cursor for select * from foo2 order by value;
                                                        QUERY PLAN
 -------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43709.313..49265.244 rows=9999999 loops=1)
    Sort Key: value
    Sort Method:  external merge  Disk: 204208kB
    ->  Seq Scan on foo2  (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.072..1760.585 rows=9999999 loops=1)
  Total runtime: 54399.967 ms
 (5 rows)

#15Milan Zamazal
pdm@brailcom.org
In reply to: Pavel Stehule (#14)
Re: Large tables, ORDER BY and sequence/index scans

"PS" == Pavel Stehule <pavel.stehule@gmail.com> writes:

PS> and value efective_cache_size ???

effective_cache_size = 128MB

PS> what is CREATE INDEX stament for index?

create index foo2_value_idx on foo2(value);

#16John R Pierce
pierce@hogranch.com
In reply to: Milan Zamazal (#15)
Re: Large tables, ORDER BY and sequence/index scans

Milan Zamazal wrote:

PS> and value efective_cache_size ???

effective_cache_size = 128MB

thats rather small unless your systme is very memory constrained.
assuming postgres is the primary disk IO consumer on this ysstem, take a
look at the 'cached' value on TOP or whatever after its been running,
thats a good first order estimate of effective_cache_size.... this is
often half or more of your physical memory.

#17Milan Zamazal
pdm@brailcom.org
In reply to: John R Pierce (#16)
Re: Large tables, ORDER BY and sequence/index scans

"JRP" == John R Pierce <pierce@hogranch.com> writes:

effective_cache_size = 128MB

JRP> thats rather small unless your systme is very memory
JRP> constrained. assuming postgres is the primary disk IO consumer
JRP> on this ysstem, take a look at the cached' value on TOP or
JRP> whatever after its been running, thats a good first order
JRP> estimate of effective_cache_size.... this is often half or more
JRP> of your physical memory.

Indeed, increasing effective_cache_size helps, thanks.

Thank you all for the tips, I hope I understand the planner behavior for
my queries better now.

Regards,
Milan Zamazal