"Bug" in statistics for v7.2?

Started by Marc G. Fournieralmost 24 years ago5 messages
#1Marc G. Fournier
scrappy@hub.org

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?

But, I have one field that has 5 distinct values:

iwantu=# select distinct(profiles_faith) from iwantu_profiles;
profiles_faith
----------------
0
1
2
7
8
(5 rows)

But pg_stats is reporting 1:

tablename | attname | avg_width | n_distinct
-----------------+------------------------+-----------+------------
iwantu_profiles | profiles_faith | 2 | 1

So am I reading n_distinct wrong?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marc G. Fournier (#1)
Re: "Bug" in statistics for v7.2?

"Marc G. Fournier" <scrappy@hub.org> writes:

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?
But, I have one field that has 5 distinct values:
But pg_stats is reporting 1:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

regards, tom lane

#3Marc G. Fournier
scrappy@hub.org
In reply to: Tom Lane (#2)
Re: "Bug" in statistics for v7.2?

That explains it ...

profiles_faith | count
----------------+--------
0 | 485938
1 | 2
2 | 6
7 | 2
8 | 21
(5 rows)

Cool, another waste of space *sigh*

thanks ...

On Wed, 13 Feb 2002, Tom Lane wrote:

Show quoted text

"Marc G. Fournier" <scrappy@hub.org> writes:

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?
But, I have one field that has 5 distinct values:
But pg_stats is reporting 1:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

regards, tom lane

#4Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Marc G. Fournier (#3)
Re: "Bug" in statistics for v7.2?

That explains it ...

profiles_faith | count
----------------+--------
0 | 485938
1 | 2
2 | 6
7 | 2
8 | 21
(5 rows)

Cool, another waste of space *sigh*

thanks ...

On Wed, 13 Feb 2002, Tom Lane wrote:

"Marc G. Fournier" <scrappy@hub.org> writes:

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?
But, I have one field that has 5 distinct values:
But pg_stats is reporting 1:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

Hmm ? How about select * from xxx where profiles_faith = 7
would estimate all rows, no ? Instead of 2.
That is why I think a bin for "very uncommon" values could also be
useful sometimes.

Andreas

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#4)
Re: "Bug" in statistics for v7.2?

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

Hmm ? How about select * from xxx where profiles_faith = 7
would estimate all rows, no ? Instead of 2.

Not in 7.2 ... nor in previous versions AFAIR.

That is why I think a bin for "very uncommon" values could also be
useful sometimes.

Perhaps you should experiment or read the code before opining...

regards, tom lane