Should 'sum(mvf)' read 'sum(mcv)'...?

Started by PG Bug reporting formover 3 years ago9 messagesdocs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
Description:

About halfway down this page
https://www.postgresql.org/docs/current/row-estimation-examples.html we see
the following formula for calculating selectivity:

selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)

And just below the formula we see the explanatory sentence saying:

That is, add up all the frequencies for the MCVs and subtract them from

one, ...

It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?

Many thanks,
Eric Mutta.

#2Julien Rouhaud
rjuju123@gmail.com
In reply to: PG Bug reporting form (#1)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

Hi,

On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
Description:

About halfway down this page
https://www.postgresql.org/docs/current/row-estimation-examples.html we see
the following formula for calculating selectivity:

selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)

And just below the formula we see the explanatory sentence saying:

That is, add up all the frequencies for the MCVs and subtract them from

one, ...

It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.

#3Daniel Gustafsson
daniel@yesql.se
In reply to: Julien Rouhaud (#2)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:

It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.

That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
CC:ing Tom for a trip down memory lane.

Looking at this I noticed that we mark up MCV and MCF as acronyms but they
aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping
between markup and content, so we should probably do that as per the attached?

--
Daniel Gustafsson https://vmware.com/

Attachments:

mcx_acronyms.diffapplication/octet-stream; name=mcx_acronyms.diff; x-unix-mode=0644Download+18-0
#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Daniel Gustafsson (#3)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

Hi,

On Mon, Aug 22, 2022 at 11:13:38AM +0200, Daniel Gustafsson wrote:

On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:

It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.

That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
CC:ing Tom for a trip down memory lane.

That was actually introduced 2 years before in 234d50812c8 by Bruce.

Looking at this I noticed that we mark up MCV and MCF as acronyms but they
aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping
between markup and content, so we should probably do that as per the attached?

Agreed, although MCF is only used in planstats.sgml and the acronym defined
locally.

#5Daniel Gustafsson
daniel@yesql.se
In reply to: Julien Rouhaud (#4)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

On 22 Aug 2022, at 12:08, Julien Rouhaud <rjuju123@gmail.com> wrote:

That was actually introduced 2 years before in 234d50812c8 by Bruce.

Yes, I was unclear, I meant that the second use was by Tom (whom I also missed
to CC as I said I would so doing that now).

--
Daniel Gustafsson https://vmware.com/

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#2)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

Julien Rouhaud <rjuju123@gmail.com> writes:

On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:

It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.

I don't think it's a typo exactly, but an odd abbreviation for "Most
common Values' Frequencies". (Summing the MCVs themselves isn't
sensible; they might not even be numeric.)

I'd vote for replacing mvf in both places with something a bit more
spelled-out, perhaps "mcv_freqs".

regards, tom lane

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#6)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

On 22 Aug 2022, at 14:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Julien Rouhaud <rjuju123@gmail.com> writes:

On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:

It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
that survived until now.

I don't think it's a typo exactly, but an odd abbreviation for "Most
common Values' Frequencies". (Summing the MCVs themselves isn't
sensible; they might not even be numeric.)

I'd vote for replacing mvf in both places with something a bit more
spelled-out, perhaps "mcv_freqs".

I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better. The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as <acronym>).

--
Daniel Gustafsson

Attachments:

mcf_mcv.diffapplication/octet-stream; name=mcf_mcv.diff; x-unix-mode=0644Download+19-1
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#7)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

Daniel Gustafsson <daniel@yesql.se> writes:

I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better. The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as <acronym>).

mcv_freqs looks good. I'd write the glossary entries as singular
(Most Common Frequency, Most Common Value) since our typical usage
is to pluralize them at the point of use ("MCVs"). Also, just
expanding the acronym doesn't seem that helpful. Maybe more like

MCF

Most Common Frequency, that is the frequency associated
with some Most Common Value

MCV

Most Common Value, one of the values appearing most often
within a particular table column

regards, tom lane

#9Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#8)
Re: Should 'sum(mvf)' read 'sum(mcv)'...?

On 12 Apr 2023, at 14:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better. The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as <acronym>).

mcv_freqs looks good. I'd write the glossary entries as singular
(Most Common Frequency, Most Common Value) since our typical usage
is to pluralize them at the point of use ("MCVs"). Also, just
expanding the acronym doesn't seem that helpful. Maybe more like

Pushed with your suggested changes, thanks!

--
Daniel Gustafsson