Guarantees/Semantics of pg_stats

Started by Baziotis, Stefanosabout 2 years ago4 messagesgeneral
Jump to latest
#1Baziotis, Stefanos
sb54@illinois.edu

Hi,

I'm interested in learning more about the guarantees/semantics of pg_stats. For example, is there a guarantee that the n_distinct and most_common_vals fields will take into account any values appearing more than M times or maybe with frequence more than f? In what cases will n_distinct and most_common_vals will miss some values?

Any pointers to documentation or source code would be greatly appreciated.

Best,
Stefanos

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Baziotis, Stefanos (#1)
Re: Guarantees/Semantics of pg_stats

On Sat, 2024-03-02 at 07:41 +0000, Baziotis, Stefanos wrote:

I'm interested in learning more about the guarantees/semantics of pg_stats.
For example, is there a guarantee that the n_distinct and most_common_vals
fields will take into account any values appearing more than M times or
maybe with frequence more than f? In what cases will n_distinct and
most_common_vals will miss some values?

Table Statistics are not exact. They are collected from a random sample of
the data, so they are never guaranteed to be exact.

Their purpose is to estimate the result row count and cost of execution plan
steps. You can never use them as proof.

Yours,
Laurenz Albe

#3Baziotis, Stefanos
sb54@illinois.edu
In reply to: Laurenz Albe (#2)
Re: Guarantees/Semantics of pg_stats

Hi Laurenz,

Thanks for replying. I see. Can I maybe get accurate information if the column has an index? In other words, are there any type of indexes through which I can get the number of distinct values or the values themselves, without needing to scan the column?

Best,
Stefanos
________________________________
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Saturday, March 2, 2024 04:28
To: Baziotis, Stefanos <sb54@illinois.edu>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Guarantees/Semantics of pg_stats

On Sat, 2024-03-02 at 07:41 +0000, Baziotis, Stefanos wrote:

I'm interested in learning more about the guarantees/semantics of pg_stats.
For example, is there a guarantee that the n_distinct and most_common_vals
fields will take into account any values appearing more than M times or
maybe with frequence more than f? In what cases will n_distinct and
most_common_vals will miss some values?

Table Statistics are not exact. They are collected from a random sample of
the data, so they are never guaranteed to be exact.

Their purpose is to estimate the result row count and cost of execution plan
steps. You can never use them as proof.

Yours,
Laurenz Albe

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Baziotis, Stefanos (#3)
Re: Guarantees/Semantics of pg_stats

On Sat, 2024-03-02 at 22:29 +0000, Baziotis, Stefanos wrote:

Can I maybe get accurate information if the column has an index? In other words,
are there any type of indexes through which I can get the number of distinct
values or the values themselves, without needing to scan the column?

No.

Yours,
Laurenz Albe