Should 'sum(mvf)' read 'sum(mcv)'...?
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
Description:
About halfway down this page
https://www.postgresql.org/docs/current/row-estimation-examples.html we see
the following formula for calculating selectivity:
selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)
And just below the formula we see the explanatory sentence saying:
That is, add up all the frequencies for the MCVs and subtract them from
one, ...
It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?
Many thanks,
Eric Mutta.
Hi,
On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
Description:About halfway down this page
https://www.postgresql.org/docs/current/row-estimation-examples.html we see
the following formula for calculating selectivity:selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)
And just below the formula we see the explanatory sentence saying:
That is, add up all the frequencies for the MCVs and subtract them from
one, ...
It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?
It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.
On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.
That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
CC:ing Tom for a trip down memory lane.
Looking at this I noticed that we mark up MCV and MCF as acronyms but they
aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping
between markup and content, so we should probably do that as per the attached?
--
Daniel Gustafsson https://vmware.com/
Attachments:
mcx_acronyms.diffapplication/octet-stream; name=mcx_acronyms.diff; x-unix-mode=0644Download+18-0
Hi,
On Mon, Aug 22, 2022 at 11:13:38AM +0200, Daniel Gustafsson wrote:
On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
CC:ing Tom for a trip down memory lane.
That was actually introduced 2 years before in 234d50812c8 by Bruce.
Looking at this I noticed that we mark up MCV and MCF as acronyms but they
aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping
between markup and content, so we should probably do that as per the attached?
Agreed, although MCF is only used in planstats.sgml and the acronym defined
locally.
On 22 Aug 2022, at 12:08, Julien Rouhaud <rjuju123@gmail.com> wrote:
That was actually introduced 2 years before in 234d50812c8 by Bruce.
Yes, I was unclear, I meant that the second use was by Tom (whom I also missed
to CC as I said I would so doing that now).
--
Daniel Gustafsson https://vmware.com/
Julien Rouhaud <rjuju123@gmail.com> writes:
On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?
It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.
I don't think it's a typo exactly, but an odd abbreviation for "Most
common Values' Frequencies". (Summing the MCVs themselves isn't
sensible; they might not even be numeric.)
I'd vote for replacing mvf in both places with something a bit more
spelled-out, perhaps "mcv_freqs".
regards, tom lane
On 22 Aug 2022, at 14:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.I don't think it's a typo exactly, but an odd abbreviation for "Most
common Values' Frequencies". (Summing the MCVs themselves isn't
sensible; they might not even be numeric.)I'd vote for replacing mvf in both places with something a bit more
spelled-out, perhaps "mcv_freqs".
I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better. The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as <acronym>).
--
Daniel Gustafsson
Attachments:
mcf_mcv.diffapplication/octet-stream; name=mcf_mcv.diff; x-unix-mode=0644Download+19-1
Daniel Gustafsson <daniel@yesql.se> writes:
I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better. The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as <acronym>).
mcv_freqs looks good. I'd write the glossary entries as singular
(Most Common Frequency, Most Common Value) since our typical usage
is to pluralize them at the point of use ("MCVs"). Also, just
expanding the acronym doesn't seem that helpful. Maybe more like
MCF
Most Common Frequency, that is the frequency associated
with some Most Common Value
MCV
Most Common Value, one of the values appearing most often
within a particular table column
regards, tom lane
On 12 Apr 2023, at 14:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better. The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as <acronym>).mcv_freqs looks good. I'd write the glossary entries as singular
(Most Common Frequency, Most Common Value) since our typical usage
is to pluralize them at the point of use ("MCVs"). Also, just
expanding the acronym doesn't seem that helpful. Maybe more like
Pushed with your suggested changes, thanks!
--
Daniel Gustafsson