v10 release notes for extended stats

Started by Justin Pryzbyover 5 years ago3 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

2017-03-24 [7b504eb28] Implement multivariate n-distinct coefficients
2017-04-05 [2686ee1b7] Collect and use multi-column dependency stats
2017-05-12 [bc085205c] Change CREATE STATISTICS syntax

The existing notes say:
|Add multi-column optimizer statistics to compute the correlation ratio and number of distinct values (Tomas Vondra, David Rowley, �lvaro Herrera)
|New commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS.
|This feature is helpful in estimating query memory usage and when combining the statistics from individual columns.

"correlation ratio" is referring to stxkind=d (dependencies), right ? That's
very unclear.

"helpful in estimating query memory usage": I guess it means that this allows
the planner to correctly account for large vs small number of GROUP BY values,
but it sounds more like it's going to help a user to estimate memory use.

"when combining the statistics from individual columns." this is referring to
stxkind=d, handling correlated/redundant clauses, but it'd be hard for a user
to know that.

Also, maybe it should say "combining stats from columns OF THE SAME TABLE".

So I propose:
|Allow creation of multi-column statistics objects, for computing the
|dependencies between columns and number of distinct values of combinations of columns
|(Tomas Vondra, |David Rowley, �lvaro Herrera)
|The new commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS.
|Improved statistics allow the planner to generate better query plans with more accurate
|estimates of the row count and memory usage when grouping by multiple
|columns, and more accurate estimates of the row count if WHERE clauses apply
|to multiple columns and values of some columns are correlated with values of
|other columns.

#2Bruce Momjian
bruce@momjian.us
In reply to: Justin Pryzby (#1)
Re: v10 release notes for extended stats

On Sat, Dec 19, 2020 at 01:39:27PM -0600, Justin Pryzby wrote:

2017-03-24 [7b504eb28] Implement multivariate n-distinct coefficients
2017-04-05 [2686ee1b7] Collect and use multi-column dependency stats
2017-05-12 [bc085205c] Change CREATE STATISTICS syntax

The existing notes say:
|Add multi-column optimizer statistics to compute the correlation ratio and number of distinct values (Tomas Vondra, David Rowley, �lvaro Herrera)
|New commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS.
|This feature is helpful in estimating query memory usage and when combining the statistics from individual columns.

"correlation ratio" is referring to stxkind=d (dependencies), right ? That's
very unclear.

"helpful in estimating query memory usage": I guess it means that this allows
the planner to correctly account for large vs small number of GROUP BY values,
but it sounds more like it's going to help a user to estimate memory use.

"when combining the statistics from individual columns." this is referring to
stxkind=d, handling correlated/redundant clauses, but it'd be hard for a user
to know that.

Also, maybe it should say "combining stats from columns OF THE SAME TABLE".

So I propose:
|Allow creation of multi-column statistics objects, for computing the
|dependencies between columns and number of distinct values of combinations of columns
|(Tomas Vondra, |David Rowley, �lvaro Herrera)
|The new commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS.
|Improved statistics allow the planner to generate better query plans with more accurate
|estimates of the row count and memory usage when grouping by multiple
|columns, and more accurate estimates of the row count if WHERE clauses apply
|to multiple columns and values of some columns are correlated with values of
|other columns.

Uh, at the time, that was the best text we could come up with. We don't
usually go back to update them unless there is a very good reason, and I
am not seeing that above.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: v10 release notes for extended stats

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Dec 19, 2020 at 01:39:27PM -0600, Justin Pryzby wrote:

So I propose:

Uh, at the time, that was the best text we could come up with. We don't
usually go back to update them unless there is a very good reason, and I
am not seeing that above.

Yeah, it's a couple years too late to be worth spending effort on
improving the v10 notes, I fear. If there's text in the main
documentation that could be improved, that's a different story.

regards, tom lane