document the need to analyze partitioned tables

Started by Justin Pryzbyover 4 years ago38 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

Adding -hackers, sorry for the duplicate.

This seems to be deficient, citing
/messages/by-id/0d1b394b-bec9-8a71-a336-44df7078b295@gmail.com

I'm proposing something like the attached. Ideally, there would be a central
place to put details, and the other places could refer to that.

Since the autoanalyze patch was reverted, this should be easily applied to
backbranches, which is probably most of its value.

commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu Jul 22 16:06:18 2021 -0500

documentation deficiencies for ANALYZE of partitioned tables

This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5..decfabff5d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
     to meaningful statistical changes.
    </para>
+   <para>
+    Tuples changed in partitions and inheritence children do not count towards
+    analyze on the parent table.  If the parent table is empty or rarely
+    changed, it may never be processed by autovacuum.  It is necessary to
+    periodically run an manual <command>ANALYZE</command> to keep the statistics
+    of the table hierarchy up to date.
+   </para>
+
    <para>
     As with vacuuming for space recovery, frequent updates of statistics
     are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
      <command>ANALYZE</command> commands on those tables on a suitable schedule.
     </para>
    </tip>
+
+   <tip>
+    <para>
+     The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+     partitioned tables.  Inheritence parents will only be analyzed if the
+     parent is changed - changes to child tables do not trigger autoanalyze on
+     the parent table.  It is necessary to periodically run an manual
+     <command>ANALYZE</command> to keep the statistics of the table hierarchy up to
+     date.
+    </para>
+   </tip>
+
   </sect2>
   <sect2 id="vacuum-for-visibility-map">
@@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 </programlisting>
     is compared to the total number of tuples inserted, updated, or deleted
     since the last <command>ANALYZE</command>.
+
+    Partitioned tables are not processed by autovacuum, and their statistics
+    should be updated by manually running <command>ANALYZE</command> when the
+    table is first populated, and whenever the distribution of data in its
+    partitions changes significantly.
+   </para>
+
+   <para>
+    Partitioned tables are not processed by autovacuum.  Statistics
+    should be collected by running a manual <command>ANALYZE</command> when it is
+    first populated, and updated whenever the distribution of data in its
+    partitions changes significantly.
    </para>
    <para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..b84853fd6f 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
    <title>Run <command>ANALYZE</command> Afterwards</title>
    <para>
+
     Whenever you have significantly altered the distribution of data
     within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This
     includes bulk loading large amounts of data into the table.  Running
+
     <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
     ensures that the planner has up-to-date statistics about the
     table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c423aeeea5..20ffbc2d7a 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
   </para>
   <para>
-    If the table being analyzed has one or more children,
-    <command>ANALYZE</command> will gather statistics twice: once on the
-    rows of the parent table only, and a second time on the rows of the
-    parent table with all of its children.  This second set of statistics
-    is needed when planning queries that traverse the entire inheritance
-    tree.  The autovacuum daemon, however, will only consider inserts or
-    updates on the parent table itself when deciding whether to trigger an
-    automatic analyze for that table.  If that table is rarely inserted into
-    or updated, the inheritance statistics will not be up to date unless you
-    run <command>ANALYZE</command> manually.
+    If the table being analyzed is partitioned, <command>ANALYZE</command>
+    will gather statistics by sampling blocks randomly from its partitions;
+    in addition, it will recurse into each partition and update its statistics.
+    (However, in multi-level partitioning scenarios, each leaf partition
+    will only be analyzed once.)
+    By constrast, if the table being analyzed has inheritance children,
+    <command>ANALYZE</command> will gather statistics for it twice:
+    once on the rows of the parent table only, and a second time on the
+    rows of the parent table with all of its children.  This second set of
+    statistics is needed when planning queries that traverse the entire
+    inheritance tree.  The child tables themselves are not individually
+    analyzed in this case.
   </para>
   <para>
-    If any of the child tables are foreign tables whose foreign data wrappers
-    do not support <command>ANALYZE</command>, those child tables are ignored while
-    gathering inheritance statistics.
+    The autovacuum daemon does not process partitioned tables or inheritence
+    parents.  It is usually necessary to periodically run a manual
+    <command>ANALYZE</command> to keep the statistics of the table hierarchy
+    up to date (except for nonempty inheritence parents which undergo
+    modifications of their own table data).
+    See...
+  </para>
+
+  <para>
+    If any of the child tables or partitions are foreign tables whose foreign
+    data wrappers do not support <command>ANALYZE</command>, those tables are
+    ignored while gathering inheritance statistics.
   </para>

<para>

#2Zhihong Yu
zyu@yugabyte.com
In reply to: Justin Pryzby (#1)
Re: document the need to analyze partitioned tables

On Sun, Sep 12, 2021 at 8:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Adding -hackers, sorry for the duplicate.

This seems to be deficient, citing

/messages/by-id/0d1b394b-bec9-8a71-a336-44df7078b295@gmail.com

I'm proposing something like the attached. Ideally, there would be a
central
place to put details, and the other places could refer to that.

Since the autoanalyze patch was reverted, this should be easily applied to
backbranches, which is probably most of its value.

commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu Jul 22 16:06:18 2021 -0500

documentation deficiencies for ANALYZE of partitioned tables

This is partially extracted from
1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5..decfabff5d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
to meaningful statistical changes.
</para>
+   <para>
+    Tuples changed in partitions and inheritence children do not count
towards
+    analyze on the parent table.  If the parent table is empty or rarely
+    changed, it may never be processed by autovacuum.  It is necessary to
+    periodically run an manual <command>ANALYZE</command> to keep the
statistics
+    of the table hierarchy up to date.
+   </para>
+
<para>
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
<command>ANALYZE</command> commands on those tables on a suitable
schedule.
</para>
</tip>
+
+   <tip>
+    <para>
+     The autovacuum daemon does not issue <command>ANALYZE</command>
commands for
+     partitioned tables.  Inheritence parents will only be analyzed if the
+     parent is changed - changes to child tables do not trigger
autoanalyze on
+     the parent table.  It is necessary to periodically run an manual
+     <command>ANALYZE</command> to keep the statistics of the table
hierarchy up to
+     date.
+    </para>
+   </tip>
+
</sect2>
<sect2 id="vacuum-for-visibility-map">
@@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze
scale factor * number of tu
</programlisting>
is compared to the total number of tuples inserted, updated, or
deleted
since the last <command>ANALYZE</command>.
+
+    Partitioned tables are not processed by autovacuum, and their
statistics
+    should be updated by manually running <command>ANALYZE</command> when
the
+    table is first populated, and whenever the distribution of data in its
+    partitions changes significantly.
+   </para>
+
+   <para>
+    Partitioned tables are not processed by autovacuum.  Statistics
+    should be collected by running a manual <command>ANALYZE</command>
when it is
+    first populated, and updated whenever the distribution of data in its
+    partitions changes significantly.
</para>
<para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..b84853fd6f 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND
somethingelse;
<title>Run <command>ANALYZE</command> Afterwards</title>
<para>
+
Whenever you have significantly altered the distribution of data
within a table, running <link
linkend="sql-analyze"><command>ANALYZE</command></link> is strongly
recommended. This
includes bulk loading large amounts of data into the table.  Running
+
<command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
ensures that the planner has up-to-date statistics about the
table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c423aeeea5..20ffbc2d7a 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable
class="parameter">table_and_columns</replacea
</para>
<para>
-    If the table being analyzed has one or more children,
-    <command>ANALYZE</command> will gather statistics twice: once on the
-    rows of the parent table only, and a second time on the rows of the
-    parent table with all of its children.  This second set of statistics
-    is needed when planning queries that traverse the entire inheritance
-    tree.  The autovacuum daemon, however, will only consider inserts or
-    updates on the parent table itself when deciding whether to trigger an
-    automatic analyze for that table.  If that table is rarely inserted
into
-    or updated, the inheritance statistics will not be up to date unless
you
-    run <command>ANALYZE</command> manually.
+    If the table being analyzed is partitioned, <command>ANALYZE</command>
+    will gather statistics by sampling blocks randomly from its
partitions;
+    in addition, it will recurse into each partition and update its
statistics.
+    (However, in multi-level partitioning scenarios, each leaf partition
+    will only be analyzed once.)
+    By constrast, if the table being analyzed has inheritance children,
+    <command>ANALYZE</command> will gather statistics for it twice:
+    once on the rows of the parent table only, and a second time on the
+    rows of the parent table with all of its children.  This second set of
+    statistics is needed when planning queries that traverse the entire
+    inheritance tree.  The child tables themselves are not individually
+    analyzed in this case.
</para>
<para>
-    If any of the child tables are foreign tables whose foreign data
wrappers
-    do not support <command>ANALYZE</command>, those child tables are
ignored while
-    gathering inheritance statistics.
+    The autovacuum daemon does not process partitioned tables or
inheritence
+    parents.  It is usually necessary to periodically run a manual
+    <command>ANALYZE</command> to keep the statistics of the table
hierarchy
+    up to date (except for nonempty inheritence parents which undergo
+    modifications of their own table data).
+    See...
+  </para>
+
+  <para>
+    If any of the child tables or partitions are foreign tables whose
foreign
+    data wrappers do not support <command>ANALYZE</command>, those tables
are
+    ignored while gathering inheritance statistics.
</para>

<para>

Hi,

Minor comment:

periodically run an manual -> periodically run a manual

Cheers

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#1)
Re: document the need to analyze partitioned tables

Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should revert relevant
portions of this patch.

Attachments:

0001-documentation-deficiencies-for-ANALYZE-of-partitione.patchtext/x-diff; charset=us-asciiDownload+52-14
#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#3)
Re: document the need to analyze partitioned tables

Hi,

On 10/8/21 14:58, Justin Pryzby wrote:

Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.

I went through this patch and I'd like to propose a couple changes, per
the 0002 patch:

1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
strange before, but I'm not a native speaker so maybe it's worse ...

2) Remove unnecessary whitespace changes in perform.sgml.

3) Simplify the analyze.sgml changes a bit - it was trying to cram too
much stuff into a single paragraph, so I split that.

Does that seem OK, or did omit something important?

FWIW I think it's really confusing we have inheritance and partitioning,
and partitions and child tables. And sometimes we use partitioning in
the generic sense (i.e. including the inheritance approach), and
sometimes only the declarative variant. Same for partitions vs child
tables. I can't even imagine how confusing this has to be for people
just learning this stuff. They must be in permanent WTF?! state ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v2-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchtext/x-patch; charset=UTF-8; name=v2-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchDownload+52-14
v2-0002-minor-changes-rewordings.patchtext/x-patch; charset=UTF-8; name=v2-0002-minor-changes-rewordings.patchDownload+38-23
#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#4)
Re: document the need to analyze partitioned tables

Thanks for looking at this

On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:

Hi,

On 10/8/21 14:58, Justin Pryzby wrote:

Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.

I went through this patch and I'd like to propose a couple changes, per the
0002 patch:

1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
strange before, but I'm not a native speaker so maybe it's worse ...

+     autoanalyze on the parent table.  If your queries require statistics on                                                                       
+     parent relations for proper planning, it's necessary to periodically run                                                                      

You added two references to "relations", but everything else talks about
"tables", which is all that analyze processes.

2) Remove unnecessary whitespace changes in perform.sgml.

Those were a note to myself and to any reviewer - should that be updated too ?

3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
stuff into a single paragraph, so I split that.

Does that seem OK, or did omit something important?

+ If the table being analyzed has one or more children,

I think you're referring to both legacy inheritance and and partitioning. That
should be more clear.

+    <command>ANALYZE</command> gathers two sets of statistics: once on the rows
+    of the parent table only, and a second one including rows of both the parent
+    table and all child relations.  This second set of statistics is needed when

I think should say ".. and all of its children".

FWIW I think it's really confusing we have inheritance and partitioning, and
partitions and child tables. And sometimes we use partitioning in the
generic sense (i.e. including the inheritance approach), and sometimes only
the declarative variant. Same for partitions vs child tables. I can't even
imagine how confusing this has to be for people just learning this stuff.
They must be in permanent WTF?! state ...

The docs were cleaned up some in 0c06534bd. At least the word "partitioned"
should never be used for legacy inheritance - but "partitioning" is.

--
Justin

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#5)
Re: document the need to analyze partitioned tables

On 1/21/22 19:02, Justin Pryzby wrote:

Thanks for looking at this

On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:

Hi,

On 10/8/21 14:58, Justin Pryzby wrote:

Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.

I went through this patch and I'd like to propose a couple changes, per the
0002 patch:

1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
strange before, but I'm not a native speaker so maybe it's worse ...

+     autoanalyze on the parent table.  If your queries require statistics on
+     parent relations for proper planning, it's necessary to periodically run

You added two references to "relations", but everything else talks about
"tables", which is all that analyze processes.

Good point, that should use "tables" too.

2) Remove unnecessary whitespace changes in perform.sgml.

Those were a note to myself and to any reviewer - should that be updated too ?

Ah, I see. I don't think that part needs updating - it talks about
having to analyze after a bulk load, and that applies to all tables
anyway. I don't think it needs to mention partitioned tables need an
analyze too.

3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
stuff into a single paragraph, so I split that.

Does that seem OK, or did omit something important?

+ If the table being analyzed has one or more children,

I think you're referring to both legacy inheritance and and partitioning. That
should be more clear.

I think it applies to both types of partitioning - it's just that in the
declarative partitioning case the table is always empty so no stats with
inherit=false are built.

+    <command>ANALYZE</command> gathers two sets of statistics: once on the rows
+    of the parent table only, and a second one including rows of both the parent
+    table and all child relations.  This second set of statistics is needed when

I think should say ".. and all of its children".

OK

FWIW I think it's really confusing we have inheritance and partitioning, and
partitions and child tables. And sometimes we use partitioning in the
generic sense (i.e. including the inheritance approach), and sometimes only
the declarative variant. Same for partitions vs child tables. I can't even
imagine how confusing this has to be for people just learning this stuff.
They must be in permanent WTF?! state ...

The docs were cleaned up some in 0c06534bd. At least the word "partitioned"
should never be used for legacy inheritance - but "partitioning" is.

OK

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v3-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchtext/x-patch; charset=UTF-8; name=v3-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchDownload+52-14
v3-0002-minor-changes-rewordings.patchtext/x-patch; charset=UTF-8; name=v3-0002-minor-changes-rewordings.patchDownload+38-23
#7Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#6)
Re: document the need to analyze partitioned tables

On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

[ new patch ]

This patch is originally by Justin. The latest version is by Tomas. I
think the next step is for Justin to say whether he's OK with the
latest version that Tomas posted. If he is, then I suggest that he
also mark it Ready for Committer, and that Tomas commit it. If he's
not, he should say what he wants changed and either post a new version
himself or wait for Tomas to do that.

I think the fact that is classified as a "Bug Fix" in the CommitFest
application is not particularly good. I would prefer to see it
classified under "Documentation". I'm prepared to concede that
documentation can have bugs as a general matter, but nobody's data is
getting eaten because the documentation wasn't updated. In fact, this
is the fourth patch from the "bug fix" section I've studied this
afternoon, and, well, none of them have been back-patchable code
defects.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#7)
Re: document the need to analyze partitioned tables

On Mon, Mar 14, 2022 at 05:23:54PM -0400, Robert Haas wrote:

On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

[ new patch ]

This patch is originally by Justin. The latest version is by Tomas. I
think the next step is for Justin to say whether he's OK with the
latest version that Tomas posted. If he is, then I suggest that he
also mark it Ready for Committer, and that Tomas commit it. If he's
not, he should say what he wants changed and either post a new version
himself or wait for Tomas to do that.

Yes, I think it can be Ready. Done.

I amended some of Tomas' changes (see 0003, attached as txt).

@cfbot: the *.patch file is for your consumption, and the others are only there
to show my changes.

I think the fact that is classified as a "Bug Fix" in the CommitFest
application is not particularly good. I would prefer to see it
classified under "Documentation". I'm prepared to concede that
documentation can have bugs as a general matter, but nobody's data is
getting eaten because the documentation wasn't updated. In fact, this
is the fourth patch from the "bug fix" section I've studied this
afternoon, and, well, none of them have been back-patchable code
defects.

In fact, I consider this to be back-patchable back to v10. IMO it's an
omission that this isn't documented. Not all bugs cause data to be eaten. If
someone reads the existing documentation, they might conclude that their
partitioned tables don't need to be analyzed, and they would've been better
served by not reading the docs.

--
Justin

Attachments:

0001-documentation-deficiencies-for-ANALYZE-of-partitione.patchtext/x-diff; charset=us-asciiDownload+58-4
0001-documentation-deficiencies-for-ANALYZE-of-partitione.patch.txttext/plain; charset=us-asciiDownload+50-14
0002-minor-changes-rewordings.patch.txttext/plain; charset=us-asciiDownload+38-21
0003-f-3.patch.txttext/plain; charset=us-asciiDownload+11-12
#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#8)
Re: document the need to analyze partitioned tables

On 3/16/22 00:00, Justin Pryzby wrote:

On Mon, Mar 14, 2022 at 05:23:54PM -0400, Robert Haas wrote:

On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

[ new patch ]

This patch is originally by Justin. The latest version is by Tomas. I
think the next step is for Justin to say whether he's OK with the
latest version that Tomas posted. If he is, then I suggest that he
also mark it Ready for Committer, and that Tomas commit it. If he's
not, he should say what he wants changed and either post a new version
himself or wait for Tomas to do that.

Yes, I think it can be Ready. Done.

I amended some of Tomas' changes (see 0003, attached as txt).

@cfbot: the *.patch file is for your consumption, and the others are only there
to show my changes.

I think the fact that is classified as a "Bug Fix" in the CommitFest
application is not particularly good. I would prefer to see it
classified under "Documentation". I'm prepared to concede that
documentation can have bugs as a general matter, but nobody's data is
getting eaten because the documentation wasn't updated. In fact, this
is the fourth patch from the "bug fix" section I've studied this
afternoon, and, well, none of them have been back-patchable code
defects.

In fact, I consider this to be back-patchable back to v10. IMO it's an
omission that this isn't documented. Not all bugs cause data to be eaten. If
someone reads the existing documentation, they might conclude that their
partitioned tables don't need to be analyzed, and they would've been better
served by not reading the docs.

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Daniel Gustafsson
daniel@yesql.se
In reply to: Tomas Vondra (#9)
Re: document the need to analyze partitioned tables

On 28 Mar 2022, at 15:05, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).

I happened to spot a small typo in this commit in the ANALYZE docs, and have
just pushed a fix all the way down to 10 as per the original commit.

--
Daniel Gustafsson https://vmware.com/

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tomas Vondra (#9)
Re: document the need to analyze partitioned tables

On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).

I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.

By the way, where are the statistics of a partitioned tables used? The actual
tables scanned are always the partitions, and in the execution plans that
I have seen, the optimizer always used the statistics of the partitions.

Yours,
Laurenz Albe

Attachments:

0001-Improve-autovacuum-doc-on-partitioned-tables.patchtext/x-patch; charset=UTF-8; name=0001-Improve-autovacuum-doc-on-partitioned-tables.patchDownload+9-5
#12Andrei Lepikhov
lepihov@gmail.com
In reply to: Laurenz Albe (#11)
Re: document the need to analyze partitioned tables

On 10/5/22 13:37, Laurenz Albe wrote:

On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).

I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.

By the way, where are the statistics of a partitioned tables used? The actual
tables scanned are always the partitions, and in the execution plans that
I have seen, the optimizer always used the statistics of the partitions.

For example, it is used to estimate selectivity of join clause:

CREATE TABLE test (id integer, val integer) PARTITION BY hash (id);
CREATE TABLE test_0 PARTITION OF test
FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE test_1 PARTITION OF test
FOR VALUES WITH (modulus 2, remainder 1);

INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q);
VACUUM ANALYZE test;
INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q);
VACUUM ANALYZE test_0,test_1;

EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
VACUUM ANALYZE test;
EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;

Here without actual statistics on parent table we make wrong prediction.

--
Regards
Andrey Lepikhov
Postgres Professional

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Laurenz Albe (#11)
Re: document the need to analyze partitioned tables

On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:

On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).

I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.

Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage. But I see what you mean. Readers
might misinterpret this to mean that autovacuum will not process the
partitions. There's a good definition of what the docs mean by
"partitioned table" [0]https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE, but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions." So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.

Here is my take on the wording:

Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.

[0]: https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14Bruce Momjian
bruce@momjian.us
In reply to: Nathan Bossart (#13)
Re: document the need to analyze partitioned tables

On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:

On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:

On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).

I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.

Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage. But I see what you mean. Readers
might misinterpret this to mean that autovacuum will not process the
partitions. There's a good definition of what the docs mean by
"partitioned table" [0], but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions." So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.

Here is my take on the wording:

Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.

Uh, what about autovacuum's handling of partitioned tables? This makes
it sound like it ignores them because it talks about manual ANALYZE.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#15Justin Pryzby
pryzby@telsasoft.com
In reply to: Bruce Momjian (#14)
Re: document the need to analyze partitioned tables

On Tue, Jan 17, 2023 at 03:53:24PM -0500, Bruce Momjian wrote:

On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:

On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:

On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).

I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.

Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage. But I see what you mean. Readers
might misinterpret this to mean that autovacuum will not process the
partitions. There's a good definition of what the docs mean by
"partitioned table" [0], but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions." So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.

Here is my take on the wording:

Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.

Uh, what about autovacuum's handling of partitioned tables? This makes
it sound like it ignores them because it talks about manual ANALYZE.

If we're referring to the *partitioned* table, then it does ignore them.
See:

|commit 6f8127b7390119c21479f5ce495b7d2168930e82
|Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
|Date: Mon Aug 16 17:27:52 2021 -0400
|
| Revert analyze support for partitioned tables

Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."

--
Justin

#16Bruce Momjian
bruce@momjian.us
In reply to: Justin Pryzby (#15)
Re: document the need to analyze partitioned tables

On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:

On Tue, Jan 17, 2023 at 03:53:24PM -0500, Bruce Momjian wrote:

On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:

Here is my take on the wording:

Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.

Uh, what about autovacuum's handling of partitioned tables? This makes
it sound like it ignores them because it talks about manual ANALYZE.

If we're referring to the *partitioned* table, then it does ignore them.
See:

|commit 6f8127b7390119c21479f5ce495b7d2168930e82
|Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
|Date: Mon Aug 16 17:27:52 2021 -0400
|
| Revert analyze support for partitioned tables

Yes, I see that patch was trying to combine the statistics of individual
partitions into a partitioned table summary.

Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."

Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.

Can someone summarize how bad it is we have no statistics on partitioned
tables? It sounds bad to me.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#17Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bruce Momjian (#16)
Re: document the need to analyze partitioned tables

On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:

On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:

Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."

Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.

Can someone summarize how bad it is we have no statistics on partitioned
tables?  It sounds bad to me.

Andrey Lepikhov had an example earlier in this thread[1]/messages/by-id/3df5c68b-13aa-53d0-c0ec-ed98e6972e2e@postgrespro.ru. It doesn't take
an exotic query.

Attached is a new version of my patch that tries to improve the wording.

Yours,
Laurenz Albe

[1]: /messages/by-id/3df5c68b-13aa-53d0-c0ec-ed98e6972e2e@postgrespro.ru

Attachments:

0001-Improve-autovacuum-doc-on-partitioned-tables.V2.patchtext/x-patch; charset=UTF-8; name=0001-Improve-autovacuum-doc-on-partitioned-tables.V2.patchDownload+9-3
#18Justin Pryzby
pryzby@telsasoft.com
In reply to: Laurenz Albe (#17)
Re: document the need to analyze partitioned tables

On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:

On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:

On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:

Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."

Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.

That's what 61fa6ca79 aimed to do. Laurenz is suggesting further
clarification.

Can someone summarize how bad it is we have no statistics on partitioned
tables?� It sounds bad to me.

Andrey Lepikhov had an example earlier in this thread[1]. It doesn't take
an exotic query.

Attached is a new version of my patch that tries to improve the wording.

I tweaked this a bit to end up with:

-    Partitioned tables are not processed by autovacuum.  Statistics
-    should be collected by running a manual <command>ANALYZE</command> when it is
+    The leaf partitions of a partitioned table are normal tables and are processed
+    by autovacuum; however, autovacuum does not process the partitioned table itself.
+    This is no problem as far as <command>VACUUM</command> is concerned, since
+    there's no need to vacuum the empty, partitioned table.  But, as mentioned in
+    <xref linkend="vacuum-for-statistics"/>, it also means that autovacuum won't
+    run <command>ANALYZE</command> on the partitioned table.
+    Although statistics are automatically gathered on its leaf partitions, some queries also need
+    statistics on the partitioned table to run optimally.  You should collect statistics by
+    running a manual <command>ANALYZE</command> when the partitioned table is
first populated, and again whenever the distribution of data in its
partitions changes significantly.
</para>

"partitions are normal tables" was techically wrong, as partitions can
also be partitioned.

--
Justin

#19Bruce Momjian
bruce@momjian.us
In reply to: Justin Pryzby (#18)
Re: document the need to analyze partitioned tables

On Wed, Jan 18, 2023 at 11:49:19AM -0600, Justin Pryzby wrote:

On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:

On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:

On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:

Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."

Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.

That's what 61fa6ca79 aimed to do. Laurenz is suggesting further
clarification.

Ah, makes sense, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#20Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Justin Pryzby (#18)
Re: document the need to analyze partitioned tables

On Wed, 2023-01-18 at 11:49 -0600, Justin Pryzby wrote:

I tweaked this a bit to end up with:

-    Partitioned tables are not processed by autovacuum.  Statistics
-    should be collected by running a manual <command>ANALYZE</command> when it is
+    The leaf partitions of a partitioned table are normal tables and are processed
+    by autovacuum; however, autovacuum does not process the partitioned table itself.
+    This is no problem as far as <command>VACUUM</command> is concerned, since
+    there's no need to vacuum the empty, partitioned table.  But, as mentioned in
+    <xref linkend="vacuum-for-statistics"/>, it also means that autovacuum won't
+    run <command>ANALYZE</command> on the partitioned table.
+    Although statistics are automatically gathered on its leaf partitions, some queries also need
+    statistics on the partitioned table to run optimally.  You should collect statistics by
+    running a manual <command>ANALYZE</command> when the partitioned table is
     first populated, and again whenever the distribution of data in its
     partitions changes significantly.
    </para>

"partitions are normal tables" was techically wrong, as partitions can
also be partitioned.

I am fine with your tweaks. I think this is good to go.

Yours,
Laurenz Albe

#21Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#17)
#22Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bruce Momjian (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#22)
#24Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bruce Momjian (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#24)
#26David Rowley
dgrowleyml@gmail.com
In reply to: Laurenz Albe (#17)
#27Laurenz Albe
laurenz.albe@cybertec.at
In reply to: David Rowley (#26)
#28David Rowley
dgrowleyml@gmail.com
In reply to: Laurenz Albe (#27)
#29David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#28)
#30Daniel Gustafsson
daniel@yesql.se
In reply to: David Rowley (#29)
#31Laurenz Albe
laurenz.albe@cybertec.at
In reply to: David Rowley (#28)
#32Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#31)
#33Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bruce Momjian (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#33)
#35Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bruce Momjian (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#35)
#37Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bruce Momjian (#36)
#38Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#37)