Use order by clause, got index scan involved
Hi all:
What confused me is that: When I select data using order by clause, I got
the following execution plan:
postgres=# set session
enable_indexscan=true;
SET
postgres=# explain SELECT * FROM pg_proc ORDER BY
oid;
QUERY
PLAN
----------------------------------------------------------------------------------------
Index Scan using pg_proc_oid_index on pg_proc (cost=0.00..321.60
rows=2490 width=552)
(1
row)
postgres=#
My Question is :
If I want to find record using the where clause which hold the id column,
the index scan might be used.
But I just want to get all the records on sorted output format, Why
index scan can be used here?
I can’t imagine that:
Step 1 Index is read into memory, then for each tuple in it,
Step 2 Then we got the address of related data block, and then access the
data block .
Step 2 will be repeated for many times. I think it is not efficient.
But comparing with sort , I got that even index scan with all the entry ,
the cost is still lower than sort operation:
postgres=# set session enable_indexscan=false;
SET
postgres=# explain SELECT * FROM pg_proc ORDER BY oid;
QUERY PLAN
-------------------------------------------------------------------
Sort (cost=843.36..849.59 rows=2490 width=552)
Sort Key: oid
-> Seq Scan on pg_proc (cost=0.00..86.90 rows=2490 width=552)
(3 rows)
postgres=#
That is to say: cost of seq scan + sort > cost of index scan for every
index entry + cost of access for every related data ?
Maybe the database system is clever enough to accumulate data access for
same physical page, and reduce the times of physical page acess ?
And can somebody kindly give some more detailed information which help to
know the execution plan calculation process?
Thanks in advance.
On Wed, Nov 7, 2012 at 11:41 PM, 高健 <luckyjackgao@gmail.com> wrote:
Hi all:
What confused me is that: When I select data using order by clause, I
got the following execution plan:postgres=# set session
enable_indexscan=true;SET
postgres=# explain SELECT * FROM pg_proc ORDER BY
oid;QUERY
PLAN----------------------------------------------------------------------------------------
Index Scan using pg_proc_oid_index on pg_proc (cost=0.00..321.60
rows=2490 width=552)
You should probably use sample cases much larger than this when trying to
understand the planner. With queries this small, it almost doesn't matter
what plan is chosen.
(1
row)postgres=#
My Question is :
If I want to find record using the where clause which hold the id
column, the index scan might be used.But I just want to get all the records on sorted output format, Why
index scan can be used here?I can’t imagine that:
Step 1 Index is read into memory, then for each tuple in it,
Step 2 Then we got the address of related data block, and then access
the data block .Step 2 will be repeated for many times. I think it is not efficient.
But step 2 will repeatedly find the block it is visiting to already be in
memory, so it is efficient.
Maybe the database system is clever enough to accumulate data access for
same physical page, and reduce the times of physical page acess ?
There is a bitmap scan which does that, but such a scan can't be used to
fulfill a sort, because it doesn't return the rows in index order. What
reduces the cost here is the various levels of caching implemented by the
file system, the memory system, and the CPU. PG uses
"effective_cache_size" to try to account for these effects, although I
admit I don't quite understand what exactly it is doing in this case. I
thought that setting effective_cache_size to absurdly low values would make
the index scan cost estimate go up a lot, but it only made it go up a
little.
Cheers,
Jeff
Hi Jeff
Thank you for your reply.
I will try to learn about effective_cache_size .
Jian gao
2012/11/9 Jeff Janes <jeff.janes@gmail.com>
Show quoted text
On Wed, Nov 7, 2012 at 11:41 PM, 高健 <luckyjackgao@gmail.com> wrote:
Hi all:
What confused me is that: When I select data using order by clause, I
got the following execution plan:postgres=# set session
enable_indexscan=true;SET
postgres=# explain SELECT * FROM pg_proc ORDER BY
oid;QUERY
PLAN----------------------------------------------------------------------------------------
Index Scan using pg_proc_oid_index on pg_proc (cost=0.00..321.60
rows=2490 width=552)You should probably use sample cases much larger than this when trying to
understand the planner. With queries this small, it almost doesn't matter
what plan is chosen.(1
row)postgres=#
My Question is :
If I want to find record using the where clause which hold the id
column, the index scan might be used.But I just want to get all the records on sorted output format, Why
index scan can be used here?I can’t imagine that:
Step 1 Index is read into memory, then for each tuple in it,
Step 2 Then we got the address of related data block, and then access
the data block .Step 2 will be repeated for many times. I think it is not efficient.
But step 2 will repeatedly find the block it is visiting to already be in
memory, so it is efficient.Maybe the database system is clever enough to accumulate data access for
same physical page, and reduce the times of physical page acess ?There is a bitmap scan which does that, but such a scan can't be used to
fulfill a sort, because it doesn't return the rows in index order. What
reduces the cost here is the various levels of caching implemented by the
file system, the memory system, and the CPU. PG uses
"effective_cache_size" to try to account for these effects, although I
admit I don't quite understand what exactly it is doing in this case. I
thought that setting effective_cache_size to absurdly low values would make
the index scan cost estimate go up a lot, but it only made it go up a
little.Cheers,
Jeff