Look at heap_beginscan()

Started by Bruce Momjianover 25 years ago5 messages
#1Bruce Momjian
pgman@candle.pha.pa.us

Just do a search for heap_beginscan() and look at all those system table
heap scans. Clearly, for large installations, we should be doing index
scans.

Seems like we should consider identifying all of the needed system
indexes. I can add all indexes at one time, and people can go around
and modify heap scans to index scans with heap_fetches of the tid if
required.

Comments?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Look at heap_beginscan()

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Just do a search for heap_beginscan() and look at all those system table
heap scans. Clearly, for large installations, we should be doing index
scans.

There are a bunch of heap_beginscan() calls, but my first impression
from a quick scan is that most of them are in very non-performance-
critical paths --- not to mention paths that are deliberately ignoring
indexes because they're bootstrap or reindex code. Furthermore, some
of the remainder are scans of pretty darn small tables. (Do we need
to convert sequential scans of pg_am to indexed scans? Nyet.)

I'd be real hesitant to do a wholesale conversion, and even more
hesitant to add new system indexes to support indexscans that we
have not *proven* to be performance bottlenecks.

It's certainly something worth looking at, since we've identified
a couple of places like this that are indeed hotspots. But we need
to convince ourselves that other places are also hotspots before
we add overhead in hopes of making those places faster.

regards, tom lane

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#2)
Re: Look at heap_beginscan()

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Just do a search for heap_beginscan() and look at all those system table
heap scans. Clearly, for large installations, we should be doing index
scans.

There are a bunch of heap_beginscan() calls, but my first impression
from a quick scan is that most of them are in very non-performance-
critical paths --- not to mention paths that are deliberately ignoring
indexes because they're bootstrap or reindex code. Furthermore, some
of the remainder are scans of pretty darn small tables. (Do we need
to convert sequential scans of pg_am to indexed scans? Nyet.)

I'd be real hesitant to do a wholesale conversion, and even more
hesitant to add new system indexes to support indexscans that we
have not *proven* to be performance bottlenecks.

Well, how do we know where the critical paths are? Seems adding an
index is a cheap way to know we have all paths covered. If the table is
not updated, the index is really no overhead except 16k of disk space.

It's certainly something worth looking at, since we've identified
a couple of places like this that are indeed hotspots. But we need
to convince ourselves that other places are also hotspots before
we add overhead in hopes of making those places faster.

Are you suggesting that heap scan is faster than index in most of these
cases? How many rows does it take for a heap scan to be faster than an
index scan?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Bruce Momjian (#3)
AW: Look at heap_beginscan()

I'd be real hesitant to do a wholesale conversion, and even more
hesitant to add new system indexes to support indexscans that we
have not *proven* to be performance bottlenecks.

Well, how do we know where the critical paths are? Seems adding an
index is a cheap way to know we have all paths covered. If
the table is
not updated, the index is really no overhead except 16k of disk space.

I think the overhead is measurable, and I agree, that adding indexes
only to proven performance critical paths is the way to go.

It's certainly something worth looking at, since we've identified
a couple of places like this that are indeed hotspots. But we need
to convince ourselves that other places are also hotspots before
we add overhead in hopes of making those places faster.

Are you suggesting that heap scan is faster than index in
most of these
cases?

Yes, that is what I would guess.

How many rows does it take for a heap scan to be
faster than an
index scan?

I would say we can seq read at least 256k before the index starts
to perform better.

This brings me to another idea. Why do our indexes need at least
one level ? Why can't we have an index that starts with one leaf page,
and only if that fills up introduce the first level ?

Andreas

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Zeugswetter Andreas SB (#4)
Re: AW: Look at heap_beginscan()

It's certainly something worth looking at, since we've identified
a couple of places like this that are indeed hotspots. But we need
to convince ourselves that other places are also hotspots before
we add overhead in hopes of making those places faster.

Are you suggesting that heap scan is faster than index in
most of these
cases?

Yes, that is what I would guess.

How many rows does it take for a heap scan to be
faster than an
index scan?

I would say we can seq read at least 256k before the index starts
to perform better.

This brings me to another idea. Why do our indexes need at least
one level ? Why can't we have an index that starts with one leaf page,
and only if that fills up introduce the first level ?

OK, let's look at pg_type. We have many sequential scans of that table
in a number of places. A row is added to it for every table created by
the user. My question is which tables do we _know_ are a fixed size,
and which vary based on the number of tables/indexes/views installed by
the user. Seems in those cases, we have to use index scans because we
don't know what the size of the table will be. Same with sequential
scans of pg_index, but we already know that is a problem.

Another issue is the use of the cache. If I add cache lookups to
replace some of the sequential scans, I would like to have indexes to
use for cache loads, though I realize some are saying the sequential
scans for cache loads are faster in some cases too.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026