analyze-in-stages post upgrade questions

Started by Zechman, Derek S10 months ago28 messageshackersgeneral
Jump to latest
#1Zechman, Derek S
Derek.S.Zechman@snapon.com
hackersgeneral

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and performed the analyze-in-stages post upgrade. It has been noticed that some plans changed to use hash joins instead of nested loops. Further investigation found it was because the parent table of partitioned tables did not have stats. After running an ANALYZE on the parent tables we got similar plan an execution times as before.

I have two questions
1 - Why does analyze-in-stages not analyze the parent tables?
2 - What happens if we do not run analyze-in-stages post upgrade and just run an analyze?

Thanks,
Sean

#2Ron
ronljohnsonjr@gmail.com
In reply to: Zechman, Derek S (#1)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Fri, Jun 27, 2025 at 9:35 AM Zechman, Derek S <Derek.S.Zechman@snapon.com>
wrote:

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
performed the analyze-in-stages post upgrade. It has been noticed that
some plans changed to use hash joins instead of nested loops. Further
investigation found it was because the parent table of partitioned tables
did not have stats. After running an ANALYZE on the parent tables we got
similar plan an execution times as before.

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables?

2 – What happens if we do not run analyze-in-stages post upgrade and just
run an analyze?

It takes more time, and you don't have *any* statistics on a given table
until the ANALYZE on that table completes.

How long did "vacuumdb --analyze-only --jobs=$mumble your_db" take?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Zechman, Derek S (#1)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On 6/27/25 06:35, Zechman, Derek S wrote:

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
performed the analyze-in-stages post upgrade.  It has been noticed that
some plans changed to use hash joins instead of nested loops.  Further
investigation found it was because the parent table of partitioned
tables did not have stats.  After running an ANALYZE on the parent
tables we got similar plan an execution times as before.

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables?

2 – What happens if we do not run analyze-in-stages post upgrade and
just run an analyze?

It is spelled out in the docs:

https://www.postgresql.org/docs/current/pgupgrade.html

Emphasis added

"Using vacuumdb --all --analyze-only can efficiently generate such
statistics, and the use of --jobs can speed it up. Option
--analyze-in-stages can be used to generate **minimal statistics**
quickly. If vacuum_cost_delay is set to a non-zero value, this can be
overridden to speed up statistics generation using PGOPTIONS, e.g.,
PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."

and from here:

https://www.postgresql.org/docs/current/app-vacuumdb.html

"--analyze-in-stages

Only calculate statistics for use by the optimizer (no vacuum),
like --analyze-only. Run three stages of analyze; the first stage uses
the lowest possible statistics target (see default_statistics_target) to
produce usable statistics faster, and subsequent stages build the full
statistics.

This option is only useful to analyze a database that currently has
no statistics or has wholly incorrect ones, such as if it is newly
populated from a restored dump or by pg_upgrade. Be aware that running
with this option in a database with existing statistics may cause the
query optimizer choices to become transiently worse due to the low
statistics targets of the early stages.
"

Thanks,

Sean

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Adrian Klaver (#3)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Fri, 2025-06-27 at 08:31 -0700, Adrian Klaver wrote:

On 6/27/25 06:35, Zechman, Derek S wrote:

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
performed the analyze-in-stages post upgrade.  It has been noticed that
some plans changed to use hash joins instead of nested loops.  Further
investigation found it was because the parent table of partitioned
tables did not have stats.  After running an ANALYZE on the parent
tables we got similar plan an execution times as before.

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables?

2 – What happens if we do not run analyze-in-stages post upgrade and
just run an analyze?

It is spelled out in the docs:

https://www.postgresql.org/docs/current/pgupgrade.html

Emphasis added

"Using vacuumdb --all --analyze-only can efficiently generate such
statistics, and the use of --jobs can speed it up. Option
--analyze-in-stages can be used to generate **minimal statistics**
quickly. If vacuum_cost_delay is set to a non-zero value, this can be
overridden to speed up statistics generation using PGOPTIONS, e.g.,
PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."

and from here:

https://www.postgresql.org/docs/current/app-vacuumdb.html

"--analyze-in-stages

Only calculate statistics for use by the optimizer (no vacuum),
like --analyze-only. Run three stages of analyze; the first stage uses
the lowest possible statistics target (see default_statistics_target) to
produce usable statistics faster, and subsequent stages build the full
statistics.

This option is only useful to analyze a database that currently has
no statistics or has wholly incorrect ones, such as if it is newly
populated from a restored dump or by pg_upgrade. Be aware that running
with this option in a database with existing statistics may cause the
query optimizer choices to become transiently worse due to the low
statistics targets of the early stages.

Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.

Derek, can cou share the pg_stats entries for the partitioned table?

Yours,
Laurenz Albe

#5Zechman, Derek S
Derek.S.Zechman@snapon.com
In reply to: Ron (#2)
hackersgeneral
RE: analyze-in-stages post upgrade questions

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and performed the analyze-in-stages post upgrade. It has been noticed that some plans changed to use hash joins instead of nested loops. Further investigation found it was because the parent table of partitioned tables did not have stats. After running an ANALYZE on the parent tables we got similar plan an execution times as before.

I have two questions
1 - Why does analyze-in-stages not analyze the parent tables?
2 – What happens if we do not run analyze-in-stages post upgrade and just run an analyze?

“It takes more time, and you don't have any statistics on a given table until the ANALYZE on that table completes.

How long did "vacuumdb --analyze-only --jobs=$mumble your_db" take?”

Thanks – that makes sense. I understand what analyze in stages does just wish it would include parent tables.

"vacuumdb --all --analyze-only --jobs=7" took about 75 minutes where the analyze-in-stages after upgrade took 115 minutes. Neither of these activities analyzed the parent tables.
Reading more and it seems vacuumdb doesn’t analyze parent tables and a manual analyze on those is needed if we want better planner statistics.

#6Zechman, Derek S
Derek.S.Zechman@snapon.com
In reply to: Laurenz Albe (#4)
hackersgeneral
RE: analyze-in-stages post upgrade questions

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and

performed the analyze-in-stages post upgrade. It has been noticed that

some plans changed to use hash joins instead of nested loops. Further

investigation found it was because the parent table of partitioned

tables did not have stats. After running an ANALYZE on the parent

tables we got similar plan an execution times as before.

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables?

2 – What happens if we do not run analyze-in-stages post upgrade and

just run an analyze?

It is spelled out in the docs:

https://urldefense.com/v3/__https://www.postgresql.org/docs/current/pgupgrade.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvnFFR6Ws$&lt;https://urldefense.com/v3/__https:/www.postgresql.org/docs/current/pgupgrade.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvnFFR6Ws$&gt;

Emphasis added

"Using vacuumdb --all --analyze-only can efficiently generate such

statistics, and the use of --jobs can speed it up. Option

--analyze-in-stages can be used to generate **minimal statistics**

quickly. If vacuum_cost_delay is set to a non-zero value, this can be

overridden to speed up statistics generation using PGOPTIONS, e.g.,

PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."

and from here:

https://urldefense.com/v3/__https://www.postgresql.org/docs/current/app-vacuumdb.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvvC7gfd0$&lt;https://urldefense.com/v3/__https:/www.postgresql.org/docs/current/app-vacuumdb.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvvC7gfd0$&gt;

"--analyze-in-stages

Only calculate statistics for use by the optimizer (no vacuum),

like --analyze-only. Run three stages of analyze; the first stage uses

the lowest possible statistics target (see default_statistics_target) to

produce usable statistics faster, and subsequent stages build the full

statistics.

This option is only useful to analyze a database that currently has

no statistics or has wholly incorrect ones, such as if it is newly

populated from a restored dump or by pg_upgrade. Be aware that running

with this option in a database with existing statistics may cause the

query optimizer choices to become transiently worse due to the low

statistics targets of the early stages.

Well, that wouldn't explain why it doesn't work on partitioned tables.

I am under the impression that it should.

Derek, can cou share the pg_stats entries for the partitioned table?

Yours,

Laurenz Albe

There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below

=> select relname, reltuples, relkind from pg_class where relname ~ '^chapter_[0-9]+$' or relname='chapter' order by 1;

relname | reltuples | relkind

-------------+-----------+---------

chapter | -1 | p

chapter_1 | 4 | r

chapter_10 | 4 | r

chapter_100 | 30 | r

chapter_101 | 15 | r

chapter_102 | 15 | r

=> select count(*) from pg_stats where tablename='chapter';

count

-------

0

(1 row)

=> analyze chapter;

ANALYZE

=> select relname, reltuples, relkind from pg_class where relkind ='p' and relname='chapter';

relname | reltuples | relkind

---------+-----------+---------

chapter | 7589 | p

(1 row)

=> select count(*) from pg_stats where tablename='chapter';

count

-------

49

(1 row)

toy_epc_stg_1_db=>

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Zechman, Derek S (#6)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Sat, 2025-06-28 at 01:23 +0000, Zechman, Derek S wrote:

Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.

Derek, can cou share the pg_stats entries for the partitioned table?

There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below

You are right. I looked at the code, and "vacuumdb" does not process
partitiond tables, even if --analyze-only is specified. I find that
surprising, aince the SQL command ANALYZE (without a table name) will
also collect statistics for partitioned tables.

I think that it would be a good idea to change that behavior.
In particular, it makes a lot of sense to collect statistics for
partitioned tables after a "pg_upgrade".

Attached is a patch to make "vacuumdb --analyze-only" consider
partitioned tables as well.

Yours,
Laurenz Albe

Attachments:

v1-0001-Make-vacuumdb-Z-process-partitioned-tables.patchtext/x-patch; charset=UTF-8; name=v1-0001-Make-vacuumdb-Z-process-partitioned-tables.patchDownload+23-5
#8Mircea Cadariu
cadariu.mircea@gmail.com
In reply to: Laurenz Albe (#7)
hackersgeneral
Re: analyze-in-stages post upgrade questions

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

Hi Laurenz,

Nice focused patch.

Applied successfully on commit 62a17a92 from master.

Documentation is updated and there is an explanatory comment for the code, as well as a descriptive commit message.

To check the patch, I added the following test at the end of src/bin/scripts/t/100_vacuumdb.pl in both master (as experiment) and your patch. In master it does not pass, but with your patch applied it does.

You can consider adding it to your patch, or I could also do that.

$node->safe_psql('postgres',
"CREATE TABLE parent_table (a INT) PARTITION BY LIST (a);\n"
. "CREATE TABLE child_table PARTITION OF parent_table FOR VALUES IN (1);\n"
. "INSERT INTO parent_table VALUES (1);\n");
$node->issues_sql_like(
[
'vacuumdb', '--analyze-only', 'postgres'
],
qr/statement:\s+ANALYZE\s+public\.parent_table/s,
'--analyze_only updates statistics for partitioned tables');

Kind regards,
Mircea Cadariu

The new status of this patch is: Waiting on Author

#9Zechman, Derek S
Derek.S.Zechman@snapon.com
In reply to: Laurenz Albe (#7)
hackersgeneral
RE: analyze-in-stages post upgrade questions

Well, that wouldn't explain why it doesn't work on partitioned tables.

I am under the impression that it should.

Derek, can cou share the pg_stats entries for the partitioned table?

There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below

You are right. I looked at the code, and "vacuumdb" does not process

partitiond tables, even if --analyze-only is specified. I find that

surprising, aince the SQL command ANALYZE (without a table name) will

also collect statistics for partitioned tables.

I think that it would be a good idea to change that behavior.

In particular, it makes a lot of sense to collect statistics for

partitioned tables after a "pg_upgrade".

Attached is a patch to make "vacuumdb --analyze-only" consider

partitioned tables as well.

Yours,

Laurenz Albe

Is there a plan to include this patch in future releases/patches of postgres?

Thanks,

(Derek) Sean

#10Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Zechman, Derek S (#9)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Wed, 2025-07-09 at 11:30 +0000, Zechman, Derek S wrote:

There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below

You are right.  I looked at the code, and "vacuumdb" does not process
partitiond tables, even if --analyze-only is specified.  I find that
surprising, aince the SQL command ANALYZE (without a table name) will
also collect statistics for partitioned tables.

I think that it would be a good idea to change that behavior.
In particular, it makes a lot of sense to collect statistics for
partitioned tables after a "pg_upgrade".

Attached is a patch to make "vacuumdb --analyze-only" consider
partitioned tables as well.

Is there a plan to include this patch in future releases/patches of postgres?

I have added the patch to the current commitfest:
https://commitfest.postgresql.org/patch/5871/

So far, it has not got any peer review. So yes, I'd like to include
the patch, but I cannot make it happen by myself.
Essentially, patches get applied if
a) they get peer review and
b) a committer applies them

If you want this to happen, the best thing you could do would be
to review the patch and see if it works for you, if it does what you
need and so on:
https://wiki.postgresql.org/wiki/Reviewing_a_Patch

Yours,
Laurenz Albe

#11Mircea Cadariu
cadariu.mircea@gmail.com
In reply to: Laurenz Albe (#10)
hackersgeneral
Re: analyze-in-stages post upgrade questions

Hi Laurenz,

On 09/07/2025 16:26, Laurenz Albe wrote:

I have added the patch to the current commitfest:
https://commitfest.postgresql.org/patch/5871/

Just to let you know that I have added a review through the commitfest app.

You can see it here:
/messages/by-id/175189219162.2200286.3306593311375985296.pgcf@coridan.postgresql.org

Kind regards,

Mircea Cadariu

#12Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Mircea Cadariu (#11)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Wed, 2025-07-09 at 17:37 +0100, Mircea Cadariu wrote:

Just to let you know that I have added a review through the commitfest app. 

Thanks!

The patch is still in state "needs review".
If there is something that I should change, you should set it to
"waiting on author". If you think that the patch is ready to go
as it is, please set it to "ready for committer".

Yours,
Laurenz Albe

#13Mircea Cadariu
cadariu.mircea@gmail.com
In reply to: Laurenz Albe (#12)
hackersgeneral
Re: analyze-in-stages post upgrade questions

Hi Laurenz,

Got it. I have only one suggestion for the patch. Consider adding a
corresponding test in src/bin/scripts/t/100_vacuumdb.pl.

Proposal (I used this to check the patch):

$node->safe_psql('postgres',
    "CREATE TABLE parent_table (a INT) PARTITION BY LIST (a);\n"
      . "CREATE TABLE child_table PARTITION OF parent_table FOR VALUES
IN (1);\n"
      . "INSERT INTO parent_table VALUES (1);\n");
$node->issues_sql_like(
    [
        'vacuumdb', '--analyze-only', 'postgres'
    ],
    qr/statement:\s+ANALYZE\s+public\.parent_table/s,
    '--analyze_only updates statistics for partitioned tables');

Kind regards,

Mircea Cadariu

#14Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Mircea Cadariu (#13)
hackersgeneral
Re: analyze-in-stages post upgrade questions

[moving to pgsql-hackers]
On Thu, 2025-07-10 at 17:20 +0100, Mircea Cadariu wrote:

I have only one suggestion for the patch. Consider adding a
corresponding test in src/bin/scripts/t/100_vacuumdb.pl.

Proposal (I used this to check the patch):

$node->safe_psql('postgres',
    "CREATE TABLE parent_table (a INT) PARTITION BY LIST (a);\n"
      . "CREATE TABLE child_table PARTITION OF parent_table FOR VALUES
IN (1);\n"
      . "INSERT INTO parent_table VALUES (1);\n");
$node->issues_sql_like(
    [
        'vacuumdb', '--analyze-only', 'postgres'
    ],
    qr/statement:\s+ANALYZE\s+public\.parent_table/s,
    '--analyze_only updates statistics for partitioned tables');

Good idea; done in the attached version 2 of the patch.

Yours,
Laurenz Albe

Attachments:

v2-0001-Make-vacuumdb-Z-process-partitioned-tables.patchtext/x-patch; charset=UTF-8; name=v2-0001-Make-vacuumdb-Z-process-partitioned-tables.patchDownload+35-5
#15Mircea Cadariu
cadariu.mircea@gmail.com
In reply to: Laurenz Albe (#14)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On 11/07/2025 10:51, Laurenz Albe wrote:

Good idea; done in the attached version 2 of the patch.

Thanks! Looks good. I have set the status of the Commitfest entry to
"Ready for Committer".

Kind regards,

Mircea Cadariu

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Mircea Cadariu (#15)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Fri, Jul 11, 2025 at 7:42 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:

On 11/07/2025 10:51, Laurenz Albe wrote:

Good idea; done in the attached version 2 of the patch.

Thanks! Looks good. I have set the status of the Commitfest entry to
"Ready for Committer".

I've started reviewing the patch since it's marked as ready for committer.

Overall, I like the change. But I have one question: should this be treated as
a bug fix that we back-patch to supported branches, or is it more of
an improvement that should only go into master?

         Only calculate statistics for use by the optimizer (no vacuum).
+        If that option is specified, <command>vacuumdb</command> will also
+        process partitioned tables.  Without that option, only the partitions
+        will be considered, unless a partitioned table is explicitly specified
+        with the <option>--table</option> option.

This wording seems a bit out of place in the --analyze-only section,
since it also describes the default behavior of vacuumdb without that option.
Wouldn't it make more sense to move that explanation in the --table section?

For example, we could add something like:

------------------
If no tables are specified with the --table option, vacuumdb will
clean all regular tables and materialized views in the connected
database. If --analyze-only or --analyze-in-stages is also specified,
it will analyze all regular tables, partitioned tables, and
materialized views (but not foreign tables).
------------------

+ /*
+ * VACUUMing partitioned tables would be unreasonably expensive, since
+ * that entails processing the partitions twice (once as part of the
+ * partitioned table, once as tables in their own right) for no
+ * benefit. But if we only ANALYZE, collecting statistics for
+ * partitioned tables is worth the effort.
+ */

This is probably true. But isn't the main reason more about aligning with
the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
docs says, "There is no effective difference between vacuuming and analyzing
databases via this utility and via other methods for accessing the server.",
so its default target objects should match: VACUUM skips partitioned tables
by default, while ANALYZE includes them. If that's the case, maybe the comment
should reflect that instead.

+ qr/statement:\s+ANALYZE\s+public\.parent_table/s,
+ '--analyze_only updates statistics for partitioned tables');

A plain space might be sufficient instead of \s+.
Also, I don't think the backslash before ".parent_table" is necessary.

Regards,

--
Fujii Masao

#17Mircea Cadariu
cadariu.mircea@gmail.com
In reply to: Fujii Masao (#16)
hackersgeneral
Re: analyze-in-stages post upgrade questions

Hi,

On 30/07/2025 12:49, Fujii Masao wrote:

I've started reviewing the patch since it's marked as ready for committer.

Thanks!

Overall, I like the change. But I have one question: should this be treated as
a bug fix that we back-patch to supported branches, or is it more of
an improvement that should only go into master?

I reckon it might make sense to back-patch it to previous versions, as
users might not upgrade always to the latest version.

Only calculate statistics for use by the optimizer (no vacuum).
+        If that option is specified, <command>vacuumdb</command> will also
+        process partitioned tables.  Without that option, only the partitions
+        will be considered, unless a partitioned table is explicitly specified
+        with the <option>--table</option> option.

This wording seems a bit out of place in the --analyze-only section,
since it also describes the default behavior of vacuumdb without that option.
Wouldn't it make more sense to move that explanation in the --table section?

For example, we could add something like:

------------------
If no tables are specified with the --table option, vacuumdb will
clean all regular tables and materialized views in the connected
database. If --analyze-only or --analyze-in-stages is also specified,
it will analyze all regular tables, partitioned tables, and
materialized views (but not foreign tables).
------------------

Yes, agreed.

+ /*
+ * VACUUMing partitioned tables would be unreasonably expensive, since
+ * that entails processing the partitions twice (once as part of the
+ * partitioned table, once as tables in their own right) for no
+ * benefit. But if we only ANALYZE, collecting statistics for
+ * partitioned tables is worth the effort.
+ */

This is probably true. But isn't the main reason more about aligning with
the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
docs says, "There is no effective difference between vacuuming and analyzing
databases via this utility and via other methods for accessing the server.",
so its default target objects should match: VACUUM skips partitioned tables
by default, while ANALYZE includes them. If that's the case, maybe the comment
should reflect that instead.

I see what you mean. From that perspective, I wonder if we even need a
comment there at all.

+ qr/statement:\s+ANALYZE\s+public\.parent_table/s,
+ '--analyze_only updates statistics for partitioned tables');

A plain space might be sufficient instead of \s+.
Also, I don't think the backslash before ".parent_table" is necessary.

Good catch! Indeed let's simplify that to contain strictly only what's
necessary.

Kind regards,

Mircea Cadariu

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Mircea Cadariu (#17)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:

Overall, I like the change. But I have one question: should this be treated as
a bug fix that we back-patch to supported branches, or is it more of
an improvement that should only go into master?

I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version.

I understand your point. But on second thought, since the patch changes
behavior, I'm leaning toward treating it as an improvement, so it should
only go to master...

+ /*
+ * VACUUMing partitioned tables would be unreasonably expensive, since
+ * that entails processing the partitions twice (once as part of the
+ * partitioned table, once as tables in their own right) for no
+ * benefit. But if we only ANALYZE, collecting statistics for
+ * partitioned tables is worth the effort.
+ */

This is probably true. But isn't the main reason more about aligning with
the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
docs says, "There is no effective difference between vacuuming and analyzing
databases via this utility and via other methods for accessing the server.",
so its default target objects should match: VACUUM skips partitioned tables
by default, while ANALYZE includes them. If that's the case, maybe the comment
should reflect that instead.

I see what you mean. From that perspective, I wonder if we even need a comment there at all.

Or, if we keep it, though, I'd like to update it to something like
the following:

--------------------
vacuumdb should generally follow the behavior of the underlying
VACUUM and ANALYZE commands. If analyze_only is true, process
regular tables, materialized views, and partitioned tables, just like
ANALYZE (with no specific target tables) does. Otherwise, process
only regular tables and materialized views, since VACUUM skips
partitioned tables when no target tables are specified.
--------------------

Regards,

--
Fujii Masao

#19Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#18)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Wed, Aug 06, 2025 at 11:25:53PM +0900, Fujii Masao wrote:

On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:

Overall, I like the change. But I have one question: should this be treated as
a bug fix that we back-patch to supported branches, or is it more of
an improvement that should only go into master?

I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version.

I understand your point. But on second thought, since the patch changes
behavior, I'm leaning toward treating it as an improvement, so it should
only go to master...

I also am leaning towards treating this as v19 material. It's a nontrivial
behavior change, and this option is useful for major version upgrades,
which is an area that we really don't want to surprise users too much.
Furthermore, auto-analyze doesn't process partitioned tables, either, so
this introduces a bit of divergence. (I'd love to see that project picked
up again someday. Perhaps I will take a gander...)

--
nathan

#20Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#18)
hackersgeneral
Re: analyze-in-stages post upgrade questions

On Wed, 2025-08-06 at 23:25 +0900, Fujii Masao wrote:

On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:

Overall, I like the change. But I have one question: should this be treated as
a bug fix that we back-patch to supported branches, or is it more of
an improvement that should only go into master?

I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version.

I understand your point. But on second thought, since the patch changes
behavior, I'm leaning toward treating it as an improvement, so it should
only go to master...

I agree that this behavior change should not be backpatched.
That is not a bugfix.

+ /*
+ * VACUUMing partitioned tables would be unreasonably expensive, since
+ * that entails processing the partitions twice (once as part of the
+ * partitioned table, once as tables in their own right) for no
+ * benefit. But if we only ANALYZE, collecting statistics for
+ * partitioned tables is worth the effort.
+ */

This is probably true. But isn't the main reason more about aligning with
the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
docs says, "There is no effective difference between vacuuming and analyzing
databases via this utility and via other methods for accessing the server.",
so its default target objects should match: VACUUM skips partitioned tables
by default, while ANALYZE includes them. If that's the case, maybe the comment
should reflect that instead.

I see what you mean. From that perspective, I wonder if we even need a comment there at all.

Or, if we keep it, though, I'd like to update it to something like
the following:

--------------------
vacuumdb should generally follow the behavior of the underlying
VACUUM and ANALYZE commands. If analyze_only is true, process
regular tables, materialized views, and partitioned tables, just like
ANALYZE (with no specific target tables) does. Otherwise, process
only regular tables and materialized views, since VACUUM skips
partitioned tables when no target tables are specified.
--------------------

I am fine with that suggestion.

Alternatively, my original comment could be amended with

Besides, ANALYZE (without an option) processes partitioned tables, and
"vacuumdb -Z" should behave like ANALYZE.

Yours,
Laurenz Albe

#21Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#19)
hackersgeneral
#22Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#20)
hackersgeneral
#23Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#22)
hackersgeneral
#24Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#23)
hackersgeneral
#25Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#24)
hackersgeneral
#26Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#25)
hackersgeneral
#27Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#26)
hackersgeneral
#28Justin Pryzby
pryzby@telsasoft.com
In reply to: Fujii Masao (#26)
hackersgeneral