"Bug" in statistics for v7.2?

Started by The Hermit Hackerover 24 years ago5 messageshackers

The Hermit Hacker

scrappy@hub.org

over 24 years ago

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?

But, I have one field that has 5 distinct values:

iwantu=# select distinct(profiles_faith) from iwantu_profiles;
profiles_faith
----------------
0
1
2
7
8
(5 rows)

But pg_stats is reporting 1:

tablename | attname | avg_width | n_distinct
-----------------+------------------------+-----------+------------
iwantu_profiles | profiles_faith | 2 | 1

So am I reading n_distinct wrong?

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: The Hermit Hacker (#1)

Re: "Bug" in statistics for v7.2?

"Marc G. Fournier" <scrappy@hub.org> writes:

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?
But, I have one field that has 5 distinct values:
But pg_stats is reporting 1:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

regards, tom lane

The Hermit Hacker

scrappy@hub.org

over 24 years ago

In reply to: Tom Lane (#2)

Re: "Bug" in statistics for v7.2?

That explains it ...

profiles_faith | count
----------------+--------
0 | 485938
1 | 2
2 | 6
7 | 2
8 | 21
(5 rows)

Cool, another waste of space *sigh*

thanks ...

On Wed, 13 Feb 2002, Tom Lane wrote:

Show quoted text

"Marc G. Fournier" <scrappy@hub.org> writes:

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?
But, I have one field that has 5 distinct values:
But pg_stats is reporting 1:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

regards, tom lane

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

over 24 years ago

In reply to: The Hermit Hacker (#3)

Re: "Bug" in statistics for v7.2?

That explains it ...

profiles_faith | count
----------------+--------
0 | 485938
1 | 2
2 | 6
7 | 2
8 | 21
(5 rows)

Cool, another waste of space *sigh*

thanks ...

On Wed, 13 Feb 2002, Tom Lane wrote:

"Marc G. Fournier" <scrappy@hub.org> writes:

Okay, if I'm understanding pg_stats at all, which I may not be, n_distinct
should represent # of distinct values in that row, no?
But, I have one field that has 5 distinct values:
But pg_stats is reporting 1:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

Hmm ? How about select * from xxx where profiles_faith = 7
would estimate all rows, no ? Instead of 2.
That is why I think a bin for "very uncommon" values could also be
useful sometimes.

Andreas

Import Notes

Resolved by subject fallback

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Zeugswetter Andreas SB SD (#4)

Re: "Bug" in statistics for v7.2?

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

The pg_stats values are only, um, statistical. If 99.9% of the table is
the same value and the other four values appear only once or twice, it's
certainly possible for ANALYZE's sample to include only the common value
and miss the rare ones. AFAIK that will not break anything; if you have
an example where the planner seems to be fooled because of this, let's
see it.

Hmm ? How about select * from xxx where profiles_faith = 7
would estimate all rows, no ? Instead of 2.

Not in 7.2 ... nor in previous versions AFAIR.

That is why I think a bin for "very uncommon" values could also be
useful sometimes.

Perhaps you should experiment or read the code before opining...

regards, tom lane