document the need to analyze partitioned tables
Adding -hackers, sorry for the duplicate.
This seems to be deficient, citing
/messages/by-id/0d1b394b-bec9-8a71-a336-44df7078b295@gmail.com
I'm proposing something like the attached. Ideally, there would be a central
place to put details, and the other places could refer to that.
Since the autoanalyze patch was reverted, this should be easily applied to
backbranches, which is probably most of its value.
commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu Jul 22 16:06:18 2021 -0500
documentation deficiencies for ANALYZE of partitioned tables
This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5..decfabff5d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
to meaningful statistical changes.
</para>
+ <para>
+ Tuples changed in partitions and inheritence children do not count towards
+ analyze on the parent table. If the parent table is empty or rarely
+ changed, it may never be processed by autovacuum. It is necessary to
+ periodically run an manual <command>ANALYZE</command> to keep the statistics
+ of the table hierarchy up to date.
+ </para>
+
<para>
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
<command>ANALYZE</command> commands on those tables on a suitable schedule.
</para>
</tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+ partitioned tables. Inheritence parents will only be analyzed if the
+ parent is changed - changes to child tables do not trigger autoanalyze on
+ the parent table. It is necessary to periodically run an manual
+ <command>ANALYZE</command> to keep the statistics of the table hierarchy up to
+ date.
+ </para>
+ </tip>
+
</sect2>
<sect2 id="vacuum-for-visibility-map">
@@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</programlisting>
is compared to the total number of tuples inserted, updated, or deleted
since the last <command>ANALYZE</command>.
+
+ Partitioned tables are not processed by autovacuum, and their statistics
+ should be updated by manually running <command>ANALYZE</command> when the
+ table is first populated, and whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and updated whenever the distribution of data in its
+ partitions changes significantly.
</para>
<para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..b84853fd6f 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
<title>Run <command>ANALYZE</command> Afterwards</title>
<para>
+
Whenever you have significantly altered the distribution of data
within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This
includes bulk loading large amounts of data into the table. Running
+
<command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
ensures that the planner has up-to-date statistics about the
table. With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c423aeeea5..20ffbc2d7a 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
</para>
<para>
- If the table being analyzed has one or more children,
- <command>ANALYZE</command> will gather statistics twice: once on the
- rows of the parent table only, and a second time on the rows of the
- parent table with all of its children. This second set of statistics
- is needed when planning queries that traverse the entire inheritance
- tree. The autovacuum daemon, however, will only consider inserts or
- updates on the parent table itself when deciding whether to trigger an
- automatic analyze for that table. If that table is rarely inserted into
- or updated, the inheritance statistics will not be up to date unless you
- run <command>ANALYZE</command> manually.
+ If the table being analyzed is partitioned, <command>ANALYZE</command>
+ will gather statistics by sampling blocks randomly from its partitions;
+ in addition, it will recurse into each partition and update its statistics.
+ (However, in multi-level partitioning scenarios, each leaf partition
+ will only be analyzed once.)
+ By constrast, if the table being analyzed has inheritance children,
+ <command>ANALYZE</command> will gather statistics for it twice:
+ once on the rows of the parent table only, and a second time on the
+ rows of the parent table with all of its children. This second set of
+ statistics is needed when planning queries that traverse the entire
+ inheritance tree. The child tables themselves are not individually
+ analyzed in this case.
</para>
<para>
- If any of the child tables are foreign tables whose foreign data wrappers
- do not support <command>ANALYZE</command>, those child tables are ignored while
- gathering inheritance statistics.
+ The autovacuum daemon does not process partitioned tables or inheritence
+ parents. It is usually necessary to periodically run a manual
+ <command>ANALYZE</command> to keep the statistics of the table hierarchy
+ up to date (except for nonempty inheritence parents which undergo
+ modifications of their own table data).
+ See...
+ </para>
+
+ <para>
+ If any of the child tables or partitions are foreign tables whose foreign
+ data wrappers do not support <command>ANALYZE</command>, those tables are
+ ignored while gathering inheritance statistics.
</para>
<para>
On Sun, Sep 12, 2021 at 8:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Adding -hackers, sorry for the duplicate.
This seems to be deficient, citing
/messages/by-id/0d1b394b-bec9-8a71-a336-44df7078b295@gmail.com
I'm proposing something like the attached. Ideally, there would be a
central
place to put details, and the other places could refer to that.Since the autoanalyze patch was reverted, this should be easily applied to
backbranches, which is probably most of its value.commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu Jul 22 16:06:18 2021 -0500documentation deficiencies for ANALYZE of partitioned tables
This is partially extracted from
1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 36f975b1e5..decfabff5d 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -290,6 +290,14 @@ to meaningful statistical changes. </para>+ <para> + Tuples changed in partitions and inheritence children do not count towards + analyze on the parent table. If the parent table is empty or rarely + changed, it may never be processed by autovacuum. It is necessary to + periodically run an manual <command>ANALYZE</command> to keep the statistics + of the table hierarchy up to date. + </para> + <para> As with vacuuming for space recovery, frequent updates of statistics are more useful for heavily-updated tables than for seldom-updated @@ -347,6 +355,18 @@ <command>ANALYZE</command> commands on those tables on a suitable schedule. </para> </tip> + + <tip> + <para> + The autovacuum daemon does not issue <command>ANALYZE</command> commands for + partitioned tables. Inheritence parents will only be analyzed if the + parent is changed - changes to child tables do not trigger autoanalyze on + the parent table. It is necessary to periodically run an manual + <command>ANALYZE</command> to keep the statistics of the table hierarchy up to + date. + </para> + </tip> + </sect2><sect2 id="vacuum-for-visibility-map"> @@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu </programlisting> is compared to the total number of tuples inserted, updated, or deleted since the last <command>ANALYZE</command>. + + Partitioned tables are not processed by autovacuum, and their statistics + should be updated by manually running <command>ANALYZE</command> when the + table is first populated, and whenever the distribution of data in its + partitions changes significantly. + </para> + + <para> + Partitioned tables are not processed by autovacuum. Statistics + should be collected by running a manual <command>ANALYZE</command> when it is + first populated, and updated whenever the distribution of data in its + partitions changes significantly. </para><para> diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 89ff58338e..b84853fd6f 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; <title>Run <command>ANALYZE</command> Afterwards</title><para> + Whenever you have significantly altered the distribution of data within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This includes bulk loading large amounts of data into the table. Running + <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>) ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner might diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml index c423aeeea5..20ffbc2d7a 100644 --- a/doc/src/sgml/ref/analyze.sgml +++ b/doc/src/sgml/ref/analyze.sgml @@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea </para><para> - If the table being analyzed has one or more children, - <command>ANALYZE</command> will gather statistics twice: once on the - rows of the parent table only, and a second time on the rows of the - parent table with all of its children. This second set of statistics - is needed when planning queries that traverse the entire inheritance - tree. The autovacuum daemon, however, will only consider inserts or - updates on the parent table itself when deciding whether to trigger an - automatic analyze for that table. If that table is rarely inserted into - or updated, the inheritance statistics will not be up to date unless you - run <command>ANALYZE</command> manually. + If the table being analyzed is partitioned, <command>ANALYZE</command> + will gather statistics by sampling blocks randomly from its partitions; + in addition, it will recurse into each partition and update its statistics. + (However, in multi-level partitioning scenarios, each leaf partition + will only be analyzed once.) + By constrast, if the table being analyzed has inheritance children, + <command>ANALYZE</command> will gather statistics for it twice: + once on the rows of the parent table only, and a second time on the + rows of the parent table with all of its children. This second set of + statistics is needed when planning queries that traverse the entire + inheritance tree. The child tables themselves are not individually + analyzed in this case. </para><para> - If any of the child tables are foreign tables whose foreign data wrappers - do not support <command>ANALYZE</command>, those child tables are ignored while - gathering inheritance statistics. + The autovacuum daemon does not process partitioned tables or inheritence + parents. It is usually necessary to periodically run a manual + <command>ANALYZE</command> to keep the statistics of the table hierarchy + up to date (except for nonempty inheritence parents which undergo + modifications of their own table data). + See... + </para> + + <para> + If any of the child tables or partitions are foreign tables whose foreign + data wrappers do not support <command>ANALYZE</command>, those tables are + ignored while gathering inheritance statistics. </para><para>
Hi,
Minor comment:
periodically run an manual -> periodically run a manual
Cheers
Cleaned up and attached as a .patch.
The patch implementing autoanalyze on partitioned tables should revert relevant
portions of this patch.
Attachments:
0001-documentation-deficiencies-for-ANALYZE-of-partitione.patchtext/x-diff; charset=us-asciiDownload+52-14
Hi,
On 10/8/21 14:58, Justin Pryzby wrote:
Cleaned up and attached as a .patch.
The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.
I went through this patch and I'd like to propose a couple changes, per
the 0002 patch:
1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
strange before, but I'm not a native speaker so maybe it's worse ...
2) Remove unnecessary whitespace changes in perform.sgml.
3) Simplify the analyze.sgml changes a bit - it was trying to cram too
much stuff into a single paragraph, so I split that.
Does that seem OK, or did omit something important?
FWIW I think it's really confusing we have inheritance and partitioning,
and partitions and child tables. And sometimes we use partitioning in
the generic sense (i.e. including the inheritance approach), and
sometimes only the declarative variant. Same for partitions vs child
tables. I can't even imagine how confusing this has to be for people
just learning this stuff. They must be in permanent WTF?! state ...
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
v2-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchtext/x-patch; charset=UTF-8; name=v2-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchDownload+52-14
v2-0002-minor-changes-rewordings.patchtext/x-patch; charset=UTF-8; name=v2-0002-minor-changes-rewordings.patchDownload+38-23
Thanks for looking at this
On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:
Hi,
On 10/8/21 14:58, Justin Pryzby wrote:
Cleaned up and attached as a .patch.
The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.I went through this patch and I'd like to propose a couple changes, per the
0002 patch:1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
strange before, but I'm not a native speaker so maybe it's worse ...
+ autoanalyze on the parent table. If your queries require statistics on
+ parent relations for proper planning, it's necessary to periodically run
You added two references to "relations", but everything else talks about
"tables", which is all that analyze processes.
2) Remove unnecessary whitespace changes in perform.sgml.
Those were a note to myself and to any reviewer - should that be updated too ?
3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
stuff into a single paragraph, so I split that.Does that seem OK, or did omit something important?
+ If the table being analyzed has one or more children,
I think you're referring to both legacy inheritance and and partitioning. That
should be more clear.
+ <command>ANALYZE</command> gathers two sets of statistics: once on the rows
+ of the parent table only, and a second one including rows of both the parent
+ table and all child relations. This second set of statistics is needed when
I think should say ".. and all of its children".
FWIW I think it's really confusing we have inheritance and partitioning, and
partitions and child tables. And sometimes we use partitioning in the
generic sense (i.e. including the inheritance approach), and sometimes only
the declarative variant. Same for partitions vs child tables. I can't even
imagine how confusing this has to be for people just learning this stuff.
They must be in permanent WTF?! state ...
The docs were cleaned up some in 0c06534bd. At least the word "partitioned"
should never be used for legacy inheritance - but "partitioning" is.
--
Justin
On 1/21/22 19:02, Justin Pryzby wrote:
Thanks for looking at this
On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:
Hi,
On 10/8/21 14:58, Justin Pryzby wrote:
Cleaned up and attached as a .patch.
The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.I went through this patch and I'd like to propose a couple changes, per the
0002 patch:1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
strange before, but I'm not a native speaker so maybe it's worse ...+ autoanalyze on the parent table. If your queries require statistics on + parent relations for proper planning, it's necessary to periodically runYou added two references to "relations", but everything else talks about
"tables", which is all that analyze processes.
Good point, that should use "tables" too.
2) Remove unnecessary whitespace changes in perform.sgml.
Those were a note to myself and to any reviewer - should that be updated too ?
Ah, I see. I don't think that part needs updating - it talks about
having to analyze after a bulk load, and that applies to all tables
anyway. I don't think it needs to mention partitioned tables need an
analyze too.
3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
stuff into a single paragraph, so I split that.Does that seem OK, or did omit something important?
+ If the table being analyzed has one or more children,
I think you're referring to both legacy inheritance and and partitioning. That
should be more clear.
I think it applies to both types of partitioning - it's just that in the
declarative partitioning case the table is always empty so no stats with
inherit=false are built.
+ <command>ANALYZE</command> gathers two sets of statistics: once on the rows + of the parent table only, and a second one including rows of both the parent + table and all child relations. This second set of statistics is needed whenI think should say ".. and all of its children".
OK
FWIW I think it's really confusing we have inheritance and partitioning, and
partitions and child tables. And sometimes we use partitioning in the
generic sense (i.e. including the inheritance approach), and sometimes only
the declarative variant. Same for partitions vs child tables. I can't even
imagine how confusing this has to be for people just learning this stuff.
They must be in permanent WTF?! state ...The docs were cleaned up some in 0c06534bd. At least the word "partitioned"
should never be used for legacy inheritance - but "partitioning" is.
OK
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
v3-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchtext/x-patch; charset=UTF-8; name=v3-0001-documentation-deficiencies-for-ANALYZE-of-partiti.patchDownload+52-14
v3-0002-minor-changes-rewordings.patchtext/x-patch; charset=UTF-8; name=v3-0002-minor-changes-rewordings.patchDownload+38-23
On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
[ new patch ]
This patch is originally by Justin. The latest version is by Tomas. I
think the next step is for Justin to say whether he's OK with the
latest version that Tomas posted. If he is, then I suggest that he
also mark it Ready for Committer, and that Tomas commit it. If he's
not, he should say what he wants changed and either post a new version
himself or wait for Tomas to do that.
I think the fact that is classified as a "Bug Fix" in the CommitFest
application is not particularly good. I would prefer to see it
classified under "Documentation". I'm prepared to concede that
documentation can have bugs as a general matter, but nobody's data is
getting eaten because the documentation wasn't updated. In fact, this
is the fourth patch from the "bug fix" section I've studied this
afternoon, and, well, none of them have been back-patchable code
defects.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Mar 14, 2022 at 05:23:54PM -0400, Robert Haas wrote:
On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
[ new patch ]
This patch is originally by Justin. The latest version is by Tomas. I
think the next step is for Justin to say whether he's OK with the
latest version that Tomas posted. If he is, then I suggest that he
also mark it Ready for Committer, and that Tomas commit it. If he's
not, he should say what he wants changed and either post a new version
himself or wait for Tomas to do that.
Yes, I think it can be Ready. Done.
I amended some of Tomas' changes (see 0003, attached as txt).
@cfbot: the *.patch file is for your consumption, and the others are only there
to show my changes.
I think the fact that is classified as a "Bug Fix" in the CommitFest
application is not particularly good. I would prefer to see it
classified under "Documentation". I'm prepared to concede that
documentation can have bugs as a general matter, but nobody's data is
getting eaten because the documentation wasn't updated. In fact, this
is the fourth patch from the "bug fix" section I've studied this
afternoon, and, well, none of them have been back-patchable code
defects.
In fact, I consider this to be back-patchable back to v10. IMO it's an
omission that this isn't documented. Not all bugs cause data to be eaten. If
someone reads the existing documentation, they might conclude that their
partitioned tables don't need to be analyzed, and they would've been better
served by not reading the docs.
--
Justin
Attachments:
0001-documentation-deficiencies-for-ANALYZE-of-partitione.patchtext/x-diff; charset=us-asciiDownload+58-4
0001-documentation-deficiencies-for-ANALYZE-of-partitione.patch.txttext/plain; charset=us-asciiDownload+50-14
0002-minor-changes-rewordings.patch.txttext/plain; charset=us-asciiDownload+38-21
0003-f-3.patch.txttext/plain; charset=us-asciiDownload+11-12
On 3/16/22 00:00, Justin Pryzby wrote:
On Mon, Mar 14, 2022 at 05:23:54PM -0400, Robert Haas wrote:
On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
[ new patch ]
This patch is originally by Justin. The latest version is by Tomas. I
think the next step is for Justin to say whether he's OK with the
latest version that Tomas posted. If he is, then I suggest that he
also mark it Ready for Committer, and that Tomas commit it. If he's
not, he should say what he wants changed and either post a new version
himself or wait for Tomas to do that.Yes, I think it can be Ready. Done.
I amended some of Tomas' changes (see 0003, attached as txt).
@cfbot: the *.patch file is for your consumption, and the others are only there
to show my changes.I think the fact that is classified as a "Bug Fix" in the CommitFest
application is not particularly good. I would prefer to see it
classified under "Documentation". I'm prepared to concede that
documentation can have bugs as a general matter, but nobody's data is
getting eaten because the documentation wasn't updated. In fact, this
is the fourth patch from the "bug fix" section I've studied this
afternoon, and, well, none of them have been back-patchable code
defects.In fact, I consider this to be back-patchable back to v10. IMO it's an
omission that this isn't documented. Not all bugs cause data to be eaten. If
someone reads the existing documentation, they might conclude that their
partitioned tables don't need to be analyzed, and they would've been better
served by not reading the docs.
I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 28 Mar 2022, at 15:05, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).
I happened to spot a small typo in this commit in the ANALYZE docs, and have
just pushed a fix all the way down to 10 as per the original commit.
--
Daniel Gustafsson https://vmware.com/
On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).
I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.
By the way, where are the statistics of a partitioned tables used? The actual
tables scanned are always the partitions, and in the execution plans that
I have seen, the optimizer always used the statistics of the partitions.
Yours,
Laurenz Albe
Attachments:
0001-Improve-autovacuum-doc-on-partitioned-tables.patchtext/x-patch; charset=UTF-8; name=0001-Improve-autovacuum-doc-on-partitioned-tables.patchDownload+9-5
On 10/5/22 13:37, Laurenz Albe wrote:
On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.By the way, where are the statistics of a partitioned tables used? The actual
tables scanned are always the partitions, and in the execution plans that
I have seen, the optimizer always used the statistics of the partitions.
For example, it is used to estimate selectivity of join clause:
CREATE TABLE test (id integer, val integer) PARTITION BY hash (id);
CREATE TABLE test_0 PARTITION OF test
FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE test_1 PARTITION OF test
FOR VALUES WITH (modulus 2, remainder 1);
INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q);
VACUUM ANALYZE test;
INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q);
VACUUM ANALYZE test_0,test_1;
EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
VACUUM ANALYZE test;
EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
Here without actual statistics on parent table we make wrong prediction.
--
Regards
Andrey Lepikhov
Postgres Professional
On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.
Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage. But I see what you mean. Readers
might misinterpret this to mean that autovacuum will not process the
partitions. There's a good definition of what the docs mean by
"partitioned table" [0]https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE, but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions." So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.
Here is my take on the wording:
Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.
[0]: https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:
On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage. But I see what you mean. Readers
might misinterpret this to mean that autovacuum will not process the
partitions. There's a good definition of what the docs mean by
"partitioned table" [0], but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions." So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.Here is my take on the wording:
Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.
Uh, what about autovacuum's handling of partitioned tables? This makes
it sound like it ignores them because it talks about manual ANALYZE.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
On Tue, Jan 17, 2023 at 03:53:24PM -0500, Bruce Momjian wrote:
On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:
On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).I'd like to suggest an improvement to this. The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage. But I see what you mean. Readers
might misinterpret this to mean that autovacuum will not process the
partitions. There's a good definition of what the docs mean by
"partitioned table" [0], but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions." So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.Here is my take on the wording:
Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.Uh, what about autovacuum's handling of partitioned tables? This makes
it sound like it ignores them because it talks about manual ANALYZE.
If we're referring to the *partitioned* table, then it does ignore them.
See:
|commit 6f8127b7390119c21479f5ce495b7d2168930e82
|Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
|Date: Mon Aug 16 17:27:52 2021 -0400
|
| Revert analyze support for partitioned tables
Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."
--
Justin
On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
On Tue, Jan 17, 2023 at 03:53:24PM -0500, Bruce Momjian wrote:
On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:
Here is my take on the wording:
Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.Uh, what about autovacuum's handling of partitioned tables? This makes
it sound like it ignores them because it talks about manual ANALYZE.If we're referring to the *partitioned* table, then it does ignore them.
See:|commit 6f8127b7390119c21479f5ce495b7d2168930e82
|Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
|Date: Mon Aug 16 17:27:52 2021 -0400
|
| Revert analyze support for partitioned tables
Yes, I see that patch was trying to combine the statistics of individual
partitions into a partitioned table summary.
Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."
Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.
Can someone summarize how bad it is we have no statistics on partitioned
tables? It sounds bad to me.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.Can someone summarize how bad it is we have no statistics on partitioned
tables? It sounds bad to me.
Andrey Lepikhov had an example earlier in this thread[1]/messages/by-id/3df5c68b-13aa-53d0-c0ec-ed98e6972e2e@postgrespro.ru. It doesn't take
an exotic query.
Attached is a new version of my patch that tries to improve the wording.
Yours,
Laurenz Albe
[1]: /messages/by-id/3df5c68b-13aa-53d0-c0ec-ed98e6972e2e@postgrespro.ru
Attachments:
0001-Improve-autovacuum-doc-on-partitioned-tables.V2.patchtext/x-patch; charset=UTF-8; name=0001-Improve-autovacuum-doc-on-partitioned-tables.V2.patchDownload+9-3
On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:
On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.
That's what 61fa6ca79 aimed to do. Laurenz is suggesting further
clarification.
Can someone summarize how bad it is we have no statistics on partitioned
tables?� It sounds bad to me.Andrey Lepikhov had an example earlier in this thread[1]. It doesn't take
an exotic query.Attached is a new version of my patch that tries to improve the wording.
I tweaked this a bit to end up with:
- Partitioned tables are not processed by autovacuum. Statistics - should be collected by running a manual <command>ANALYZE</command> when it is + The leaf partitions of a partitioned table are normal tables and are processed + by autovacuum; however, autovacuum does not process the partitioned table itself. + This is no problem as far as <command>VACUUM</command> is concerned, since + there's no need to vacuum the empty, partitioned table. But, as mentioned in + <xref linkend="vacuum-for-statistics"/>, it also means that autovacuum won't + run <command>ANALYZE</command> on the partitioned table. + Although statistics are automatically gathered on its leaf partitions, some queries also need + statistics on the partitioned table to run optimally. You should collect statistics by + running a manual <command>ANALYZE</command> when the partitioned table is first populated, and again whenever the distribution of data in its partitions changes significantly. </para>
"partitions are normal tables" was techically wrong, as partitions can
also be partitioned.
--
Justin
On Wed, Jan 18, 2023 at 11:49:19AM -0600, Justin Pryzby wrote:
On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:
On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.That's what 61fa6ca79 aimed to do. Laurenz is suggesting further
clarification.
Ah, makes sense, thanks.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
On Wed, 2023-01-18 at 11:49 -0600, Justin Pryzby wrote:
I tweaked this a bit to end up with:
- Partitioned tables are not processed by autovacuum. Statistics - should be collected by running a manual <command>ANALYZE</command> when it is + The leaf partitions of a partitioned table are normal tables and are processed + by autovacuum; however, autovacuum does not process the partitioned table itself. + This is no problem as far as <command>VACUUM</command> is concerned, since + there's no need to vacuum the empty, partitioned table. But, as mentioned in + <xref linkend="vacuum-for-statistics"/>, it also means that autovacuum won't + run <command>ANALYZE</command> on the partitioned table. + Although statistics are automatically gathered on its leaf partitions, some queries also need + statistics on the partitioned table to run optimally. You should collect statistics by + running a manual <command>ANALYZE</command> when the partitioned table is first populated, and again whenever the distribution of data in its partitions changes significantly. </para>"partitions are normal tables" was techically wrong, as partitions can
also be partitioned.
I am fine with your tweaks. I think this is good to go.
Yours,
Laurenz Albe