stawidth inconsistency with all NULL columns
Consider:
CREATE TABLE testwid
(
txtnotnull text,
txtnull text,
int8notnull int8,
int8null int8
);
INSERT INTO testwid
SELECT 'a' || g.i,
NULL,
g.i,
NULL
FROM generate_series(1,10000) AS g(i);
ANALYZE testwid;
SELECT attname, avg_width FROM pg_stats WHERE tablename = 'testwid';
attname | avg_width
-------------+-----------
txtnotnull | 5
txtnull | 0
int8notnull | 8
int8null | 8
(4 rows)
I see in analyze.c
8<-----------------
/* We can only compute average width if we found some non-null values.*/
if (nonnull_cnt > 0)
[snip]
else if (null_cnt > 0)
{
/* We found only nulls; assume the column is entirely null */
stats->stats_valid = true;
stats->stanullfrac = 1.0;
if (is_varwidth)
stats->stawidth = 0; /* "unknown" */
else
stats->stawidth = stats->attrtype->typlen;
stats->stadistinct = 0.0; /* "unknown" */
}
8<-----------------
So apparently intentional, but seems gratuitously inconsistent. Could
this cause any actual inconsistent behaviors? In any case that first
comment does not reflect the code.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Joe Conway <mail@joeconway.com> writes:
else if (null_cnt > 0)
{
/* We found only nulls; assume the column is entirely null */
stats->stats_valid = true;
stats->stanullfrac = 1.0;
if (is_varwidth)
stats->stawidth = 0; /* "unknown" */
else
stats->stawidth = stats->attrtype->typlen;
stats->stadistinct = 0.0; /* "unknown" */
}
8<-----------------
So apparently intentional, but seems gratuitously inconsistent. Could
this cause any actual inconsistent behaviors? In any case that first
comment does not reflect the code.
Are you suggesting that we should set stawidth to zero even for a
fixed-width datatype? That seems pretty silly. We know exactly what
the value should be, and would be if we'd chanced to find even one
non-null entry.
regards, tom lane
On 5/21/19 3:55 PM, Tom Lane wrote:
Joe Conway <mail@joeconway.com> writes:
else if (null_cnt > 0)
{
/* We found only nulls; assume the column is entirely null */
stats->stats_valid = true;
stats->stanullfrac = 1.0;
if (is_varwidth)
stats->stawidth = 0; /* "unknown" */
else
stats->stawidth = stats->attrtype->typlen;
stats->stadistinct = 0.0; /* "unknown" */
}
8<-----------------So apparently intentional, but seems gratuitously inconsistent. Could
this cause any actual inconsistent behaviors? In any case that first
comment does not reflect the code.Are you suggesting that we should set stawidth to zero even for a
fixed-width datatype? That seems pretty silly. We know exactly what
the value should be, and would be if we'd chanced to find even one
non-null entry.
Well you could argue in similar fashion for variable width values -- if
we find even one of those, it will be at least 4 bytes. So why set those
to zero?
Not a big deal, but it struck me as odd when I was looking at the
current state of affairs.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Joe Conway <mail@joeconway.com> writes:
On 5/21/19 3:55 PM, Tom Lane wrote:
Are you suggesting that we should set stawidth to zero even for a
fixed-width datatype? That seems pretty silly. We know exactly what
the value should be, and would be if we'd chanced to find even one
non-null entry.
Well you could argue in similar fashion for variable width values -- if
we find even one of those, it will be at least 4 bytes. So why set those
to zero?
Um, really the minimum width is 1 byte, given short headers. But as
the code notes, zero means we don't know what a sane estimate would
be, which is certainly not the case for fixed-width types.
regards, tom lane