vacuum analyze again...

Started by Jean-Christophe Boggioabout 25 years ago12 messagesgeneral
Jump to latest
#1Jean-Christophe Boggio
cat@thefreecat.org

Hi,

In Oracle, there are 2 ways to do the equivalent of vacuum analyze :

* analyze table xx compute statitics
* analyze table xx estimate statistics

In the second form, you can tell on what percentage of the file you
will do your stats. This is useful to make a quick analyze on huge tables
that have a homogenous dispersion.

Is there a way to "estimate" the statistics that vacuum analyze will
use instead of "computing" them ?

--
Jean-Christophe Boggio
cat@thefreecat.org
Independant Consultant and Developer
Delphi, Linux, Perl, PostgreSQL

#2Bruce Momjian
bruce@momjian.us
In reply to: Jean-Christophe Boggio (#1)
Re: vacuum analyze again...

[ Charset ISO-8859-1 unsupported, converting... ]

Hi,

In Oracle, there are 2 ways to do the equivalent of vacuum analyze :

* analyze table xx compute statitics
* analyze table xx estimate statistics

In the second form, you can tell on what percentage of the file you
will do your stats. This is useful to make a quick analyze on huge tables
that have a homogenous dispersion.

Is there a way to "estimate" the statistics that vacuum analyze will
use instead of "computing" them ?

No, we have no ability to randomly pick rows to use for estimating
statistics. Should we have this ability?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Chris Jones
chris@mt.sri.com
In reply to: Bruce Momjian (#2)
Re: vacuum analyze again...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

No, we have no ability to randomly pick rows to use for estimating
statistics. Should we have this ability?

That would be really slick, especially given the fact that VACUUM runs
much faster than VACUUM ANALYZE for a lot of PG users. I could change
my daily maintenance scripts to do a VACUUM of everything, followed by
a VACUUM ANALYZE of the small tables, followed by a VACUUM ANALYZE
ESTIMATE (or whatever) of the large tables.

Even cooler would be the ability to set a table size threshold, so
that VACUUM ANALYZE would automatically choose the appropriate method
based on the table size.

Chris

--
chris@mt.sri.com -----------------------------------------------------
Chris Jones SRI International, Inc.
www.sri.com

#4Bruce Momjian
bruce@momjian.us
In reply to: Chris Jones (#3)
Re: vacuum analyze again...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

No, we have no ability to randomly pick rows to use for estimating
statistics. Should we have this ability?

That would be really slick, especially given the fact that VACUUM runs
much faster than VACUUM ANALYZE for a lot of PG users. I could change
my daily maintenance scripts to do a VACUUM of everything, followed by
a VACUUM ANALYZE of the small tables, followed by a VACUUM ANALYZE
ESTIMATE (or whatever) of the large tables.

Even cooler would be the ability to set a table size threshold, so
that VACUUM ANALYZE would automatically choose the appropriate method
based on the table size.

Added to TODO:

* Allow ANALYZE to process a certain random precentage of rows

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#4)
Re: vacuum analyze again...

Bruce Momjian writes:

No, we have no ability to randomly pick rows to use for estimating
statistics. Should we have this ability?

How's reading a sufficiently large fraction of random rows going to be
significantly faster than reading all rows? If you're just going to read
the first n rows then that isn't really random, is it?

Ingres did this too, I thought. You could specify a certain number of
random rows, perhaps 10%. On a large table, that is often good enough
and much faster. Often 2% is enough.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#2)
Re: vacuum analyze again...

Bruce Momjian writes:

No, we have no ability to randomly pick rows to use for estimating
statistics. Should we have this ability?

How's reading a sufficiently large fraction of random rows going to be
significantly faster than reading all rows? If you're just going to read
the first n rows then that isn't really random, is it?

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#5)
Re: vacuum analyze again...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

How's reading a sufficiently large fraction of random rows going to be
significantly faster than reading all rows? If you're just going to read
the first n rows then that isn't really random, is it?

Ingres did this too, I thought. You could specify a certain number of
random rows, perhaps 10%. On a large table, that is often good enough
and much faster. Often 2% is enough.

Peter's got a good point though. Even 2% is going to mean fetching most
or all of the blocks in the table, for typical-size rows. Furthermore,
fetching (say) every second or third block is likely to be actually
slower than a straight sequential read, because you're now fighting the
kernel's readahead policy instead of working with it.

To get a partial VACUUM ANALYZE that was actually usefully faster than
the current code, I think you'd have to read just a few percent of the
blocks, which means much less than a few percent of the rows ... unless
maybe you picked selected blocks but then used all the rows in those
blocks ... but is that a random sample? It's debatable.

I find it hard to believe that VAC ANALYZE is all that much slower than
plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
my experience. It would be useful to know exactly what the columns are
in a table where VAC ANALYZE is considered unusably slow.

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
Re: vacuum analyze again...

To get a partial VACUUM ANALYZE that was actually usefully faster than
the current code, I think you'd have to read just a few percent of the
blocks, which means much less than a few percent of the rows ... unless
maybe you picked selected blocks but then used all the rows in those
blocks ... but is that a random sample? It's debatable.

I find it hard to believe that VAC ANALYZE is all that much slower than
plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
my experience. It would be useful to know exactly what the columns are
in a table where VAC ANALYZE is considered unusably slow.

VACUUM ANALYZE does a huge number of adt/ function calls. It must be
those calls that make ANALYZE slower. People report ANALYZE is
certainly slower, and that is the only difference.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#8)
Re: vacuum analyze again...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I find it hard to believe that VAC ANALYZE is all that much slower than
plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
my experience. It would be useful to know exactly what the columns are
in a table where VAC ANALYZE is considered unusably slow.

VACUUM ANALYZE does a huge number of adt/ function calls. It must be
those calls that make ANALYZE slower. People report ANALYZE is
certainly slower, and that is the only difference.

That's why I'm asking what the data is. The function calls per se can't
be that slow; I think there must be some datatype-specific issue.

With TOAST in the mix, TOAST fetches could very well be an issue, but
I didn't think 7.1 was being discussed ...

regards, tom lane

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#9)
Re: vacuum analyze again...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I find it hard to believe that VAC ANALYZE is all that much slower than
plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
my experience. It would be useful to know exactly what the columns are
in a table where VAC ANALYZE is considered unusably slow.

VACUUM ANALYZE does a huge number of adt/ function calls. It must be
those calls that make ANALYZE slower. People report ANALYZE is
certainly slower, and that is the only difference.

That's why I'm asking what the data is. The function calls per se can't
be that slow; I think there must be some datatype-specific issue.

With TOAST in the mix, TOAST fetches could very well be an issue, but
I didn't think 7.1 was being discussed ...

I would love to hear what the issue is with ANALYZE. There isn't much
going on with ANALYZE except the function calls.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#11Pete Forman
pete.forman@westerngeco.com
In reply to: Bruce Momjian (#4)
Re: vacuum analyze again...

Bruce Momjian writes:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

No, we have no ability to randomly pick rows to use for
estimating statistics. Should we have this ability?

That would be really slick, especially given the fact that VACUUM
runs much faster than VACUUM ANALYZE for a lot of PG users. I
could change my daily maintenance scripts to do a VACUUM of
everything, followed by a VACUUM ANALYZE of the small tables,
followed by a VACUUM ANALYZE ESTIMATE (or whatever) of the large
tables.

Even cooler would be the ability to set a table size threshold,
so that VACUUM ANALYZE would automatically choose the appropriate
method based on the table size.

Added to TODO:

* Allow ANALYZE to process a certain random precentage of
rows

Does this reduced analysis need to be random? Why not allow the DBA
to specify what rows or blocks to do in some way.
--
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco -./\.- by myself and does not represent
pete.forman@westerngeco.com -./\.- opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef -./\.- Hughes or their divisions.

#12Bruce Momjian
bruce@momjian.us
In reply to: Pete Forman (#11)
Re: vacuum analyze again...

Added to TODO:

* Allow ANALYZE to process a certain random precentage of
rows

Does this reduced analysis need to be random? Why not allow the DBA
to specify what rows or blocks to do in some way.

No, we are not about to add the kitchen sink here. If you really want
to control the statistics values, just update pg_attribute and
pg_statistics.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026