pg_dump ANALYZE statements

Started by Simon Riggsalmost 19 years ago7 messages
#1Simon Riggs
simon@2ndquadrant.com

There is currently a performance tip to run ANALYZE after a pg_dump
script has been restored.

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option.
-z on | off
--analyze=on | off

This would add a table-specific ANALYZE statement following each table's
actions.

I'm not aware of a strong argument against such an option. Performance
surely can't be one because the time saved on the ANALYZE will quickly
bite back on time lost on poorly planned queries.

What does the panel think?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: pg_dump ANALYZE statements

"Simon Riggs" <simon@2ndquadrant.com> writes:

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

Having pg_dump emit ANALYZE was discussed and rejected years ago.
Have you read that discussion? Do you have any new arguments to make?

regards, tom lane

#3Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Simon Riggs (#1)
Re: pg_dump ANALYZE statements

Simon Riggs wrote:

There is currently a performance tip to run ANALYZE after a pg_dump
script has been restored.

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option.
-z on | off
--analyze=on | off

This would add a table-specific ANALYZE statement following each table's
actions.

I'm not aware of a strong argument against such an option. Performance
surely can't be one because the time saved on the ANALYZE will quickly
bite back on time lost on poorly planned queries.

What does the panel think?

how is this going to interact with the (now by default enabled)
autovacuum daemon ?

Stefan

#4Gregory Stark
stark@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: pg_dump ANALYZE statements

"Simon Riggs" <simon@2ndquadrant.com> writes:

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option.
-z on | off
--analyze=on | off

Only an aesthetic comment:

Short options don't usually take on/off arguments, I would suggest making the
default be to analyze and make -z and --analyze=off disable the analyze.

You might also consider having a --analyze=verbose and perhaps a
--analyze=full though currently that would require doing vacuum analyze.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#5Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: pg_dump ANALYZE statements

On Mon, 2007-01-22 at 10:49 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

Having pg_dump emit ANALYZE was discussed and rejected years ago.
Have you read that discussion? Do you have any new arguments to make?

Well, its been suggested before, but I can't see any good arguments
against. Specifically, there was one person who spoke in favour of it
last time it was mentioned.

http://archives.postgresql.org/pgsql-hackers/2003-02/msg01270.php

If its a performance tip, we should be doing it automatically. If we
genuinely believe that autovacuum will handle it, then we can simply
alter the docs to say: if you aren't running autovacuum, then don't
forget to ANALYZE. IMHO it is not sufficient to rely upon autovacuum to
do all of this work for us.

Rod mentions the discussion has been raised before but doesn't state
what the arguments were:
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01264.php

These other posts also seem to be in favour of the idea...
http://archives.postgresql.org/pgsql-performance/2006-10/msg00142.php
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01273.php

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#6Jim C. Nasby
jim@nasby.net
In reply to: Simon Riggs (#1)
Re: pg_dump ANALYZE statements

On Mon, Jan 22, 2007 at 03:40:17PM +0000, Simon Riggs wrote:

This would add a table-specific ANALYZE statement following each table's
actions.

It'd probably be best to put it before any index creating activities,
since there's a better chance of everything from the table being in
shared buffers.

Better yet would be if COPY could analyze data as it was loaded in...
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim C. Nasby (#6)
Re: pg_dump ANALYZE statements

"Jim C. Nasby" <jim@nasby.net> writes:

On Mon, Jan 22, 2007 at 03:40:17PM +0000, Simon Riggs wrote:

This would add a table-specific ANALYZE statement following each table's
actions.

It'd probably be best to put it before any index creating activities,

No, because then you'd fail to accumulate any stats on partial or
functional indexes. There's been talk of using the presence of
multi-column indexes to guide creation of cross-column statistics, too.

regards, tom lane