extended stats not friendly towards ANALYZE with subset of columns
I'm just reviewing Tomas' code for the dependencies part of the stats
when I saw something that looked a bit unusual.
I tested with:
CREATE TABLE ab1 (a INTEGER, b INTEGER);
ALTER TABLE ab1 ALTER a SET STATISTICS 0;
INSERT INTO ab1 SELECT a, a%23 FROM generate_series(1, 1000) a;
CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
ANALYZE ab1;
And got:
ERROR: extended statistics could not be collected for column "a" of
relation public.ab1
HINT: Consider ALTER TABLE "public"."ab1" ALTER "a" SET STATISTICS -1
I don't think the error is useful here, as it means nothing gets done.
Probably better to just not (re)build those stats.
Another option would be to check for extended stats before deciding
which rows to ANALYZE, then still gathering the columns required for
MV stats, but I think if the user asks for a subset of columns to be
analyzed, and that causes a column to be missing for an extended
statistics, that it would be pretty surprising if we rebuild the
extended stats.
Perhaps the SET STATISTIC 0 for a column still needs to gather data
for extended statistics, though. Perhaps a debate should ensue about
how that should work exactly.
I've attached a patch which fixes the problem above, but it does
nothing to change the analyze behaviour for 0 statistics columns.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
stats_ext_analyze_fix.patchapplication/octet-stream; name=stats_ext_analyze_fix.patchDownload
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index d2b9f6a..3deabc4 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -44,8 +44,8 @@ typedef struct StatExtEntry
static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
-static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
- int natts, VacAttrStats **vacattrstats);
+static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *stakeys,
+ int nvacatts, VacAttrStats **vacattrstats);
static void statext_store(Relation pg_stext, Oid relid,
MVNDistinct *ndistinct,
VacAttrStats **stats);
@@ -77,10 +77,14 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
VacAttrStats **stats;
ListCell *lc2;
- /* filter only the interesting vacattrstats records */
+ /* check if we can build these stats based on the column analyzed */
stats = lookup_var_attr_stats(onerel, stat->columns,
natts, vacattrstats);
+ /* nope, skip it */
+ if (!stats)
+ continue;
+
/* check allowed number of dimensions */
Assert(bms_num_members(stat->columns) >= 2 &&
bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);
@@ -191,31 +195,33 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
}
/*
- * Using 'vacattrstats' of size 'natts' as input data, return a newly built
- * VacAttrStats array which includes only the items corresponding to attributes
- * indicated by 'attrs'.
+ * Using 'vacattrstats' of size 'nvacatts' as input data, return a newly built
+ * VacAttrStats array which includes only the items corresponding to
+ * attributes indicated by 'stakeys'. If we don't have all of the per column
+ * stats available to compute the extended stats, then we return NULL to indicate
+ * to the caller that the stats should not be built.
*/
static VacAttrStats **
-lookup_var_attr_stats(Relation rel, Bitmapset *attrs, int natts,
+lookup_var_attr_stats(Relation rel, Bitmapset *stakeys, int nvacatts,
VacAttrStats **vacattrstats)
{
int i = 0;
- int x = -1;
+ int key;
VacAttrStats **stats;
- Bitmapset *matched = NULL;
stats = (VacAttrStats **)
- palloc(bms_num_members(attrs) * sizeof(VacAttrStats *));
+ palloc(bms_num_members(stakeys) * sizeof(VacAttrStats *));
/* lookup VacAttrStats info for the requested columns (same attnum) */
- while ((x = bms_next_member(attrs, x)) >= 0)
+ key = -1;
+ while ((key = bms_next_member(stakeys, key)) >= 0)
{
int j;
stats[i] = NULL;
- for (j = 0; j < natts; j++)
+ for (j = 0; j < nvacatts; j++)
{
- if (x == vacattrstats[j]->tupattnum)
+ if (key == vacattrstats[j]->tupattnum)
{
stats[i] = vacattrstats[j];
break;
@@ -223,29 +229,24 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs, int natts,
}
if (!stats[i])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("extended statistics could not be collected for column \"%s\" of relation %s.%s",
- NameStr(RelationGetDescr(rel)->attrs[x - 1]->attname),
- get_namespace_name(rel->rd_rel->relnamespace),
- RelationGetRelationName(rel)),
- errhint("Consider ALTER TABLE \"%s\".\"%s\" ALTER \"%s\" SET STATISTICS -1",
- get_namespace_name(rel->rd_rel->relnamespace),
- RelationGetRelationName(rel),
- NameStr(RelationGetDescr(rel)->attrs[x - 1]->attname))));
-
- /*
- * Check that we found a non-dropped column and that the attnum
- * matches.
- */
+ {
+ /*
+ * Looks like stats were not gathered for one of the columns
+ * required. We'll be unable to build the extended stats without
+ * this column.
+ */
+ pfree(stats);
+ return NULL;
+ }
+
+ /*
+ * Sanity check that the column is not dropped - stats should have
+ * been removed in this case.
+ */
Assert(!stats[i]->attr->attisdropped);
- matched = bms_add_member(matched, stats[i]->tupattnum);
i++;
}
- if (bms_subset_compare(matched, attrs) != BMS_EQUAL)
- elog(ERROR, "could not find all attributes in attribute stats array");
- bms_free(matched);
return stats;
}
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 8fe96d6..031edbd 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -34,17 +34,6 @@ Statistics:
"public.ab1_b_c_stats" WITH (ndistinct) ON (b, c)
DROP TABLE ab1;
--- Ensure things work sanely with SET STATISTICS 0
-CREATE TABLE ab1 (a INTEGER, b INTEGER);
-ALTER TABLE ab1 ALTER a SET STATISTICS 0;
-INSERT INTO ab1 SELECT a, a%23 FROM generate_series(1, 1000) a;
-CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
-ANALYZE ab1;
-ERROR: extended statistics could not be collected for column "a" of relation public.ab1
-HINT: Consider ALTER TABLE "public"."ab1" ALTER "a" SET STATISTICS -1
-ALTER TABLE ab1 ALTER a SET STATISTICS -1;
-ANALYZE ab1;
-DROP TABLE ab1;
-- n-distinct tests
CREATE TABLE ndistinct (
filler1 TEXT,
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 4faaf88..f0fdf9f 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -28,17 +28,6 @@ ALTER TABLE ab1 DROP COLUMN a;
\d ab1
DROP TABLE ab1;
--- Ensure things work sanely with SET STATISTICS 0
-CREATE TABLE ab1 (a INTEGER, b INTEGER);
-ALTER TABLE ab1 ALTER a SET STATISTICS 0;
-INSERT INTO ab1 SELECT a, a%23 FROM generate_series(1, 1000) a;
-CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
-ANALYZE ab1;
-ALTER TABLE ab1 ALTER a SET STATISTICS -1;
-ANALYZE ab1;
-DROP TABLE ab1;
-
-
-- n-distinct tests
CREATE TABLE ndistinct (
filler1 TEXT,
On Tue, Mar 28, 2017 at 10:30:03PM +1300, David Rowley wrote:
I'm just reviewing Tomas' code for the dependencies part of the stats
when I saw something that looked a bit unusual.I tested with:
CREATE TABLE ab1 (a INTEGER, b INTEGER);
ALTER TABLE ab1 ALTER a SET STATISTICS 0;
INSERT INTO ab1 SELECT a, a%23 FROM generate_series(1, 1000) a;
CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
ANALYZE ab1;And got:
ERROR: extended statistics could not be collected for column "a" of
relation public.ab1
HINT: Consider ALTER TABLE "public"."ab1" ALTER "a" SET STATISTICS -1I don't think the error is useful here, as it means nothing gets done.
Probably better to just not (re)build those stats.Another option would be to check for extended stats before deciding
which rows to ANALYZE, then still gathering the columns required for
MV stats, but I think if the user asks for a subset of columns to be
analyzed, and that causes a column to be missing for an extended
statistics, that it would be pretty surprising if we rebuild the
extended stats.Perhaps the SET STATISTIC 0 for a column still needs to gather data
for extended statistics, though. Perhaps a debate should ensue about
how that should work exactly.I've attached a patch which fixes the problem above, but it does
nothing to change the analyze behaviour for 0 statistics columns.
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. �lvaro,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch wrote:
The above-described topic is currently a PostgreSQL 10 open item. �lvaro,
since you committed the patch believed to have created it, you own this open
item.
I'll commit a fix for this problem no later than Friday 14th.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 05, 2017 at 08:58:54AM -0300, Alvaro Herrera wrote:
Noah Misch wrote:
The above-described topic is currently a PostgreSQL 10 open item. �lvaro,
since you committed the patch believed to have created it, you own this open
item.I'll commit a fix for this problem no later than Friday 14th.
This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch wrote:
On Wed, Apr 05, 2017 at 08:58:54AM -0300, Alvaro Herrera wrote:
Noah Misch wrote:
The above-described topic is currently a PostgreSQL 10 open item. �lvaro,
since you committed the patch believed to have created it, you own this open
item.I'll commit a fix for this problem no later than Friday 14th.
This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
I'm working on the patch currently and intend to get it pushed no later
than tomorrow 19:00 America/Santiago.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley wrote:
ERROR: extended statistics could not be collected for column "a" of
relation public.ab1
HINT: Consider ALTER TABLE "public"."ab1" ALTER "a" SET STATISTICS -1I don't think the error is useful here, as it means nothing gets done.
Probably better to just not (re)build those stats.
Agreed, an error there is a bad idea.
Another option would be to check for extended stats before deciding
which rows to ANALYZE, then still gathering the columns required for
MV stats, but I think if the user asks for a subset of columns to be
analyzed, and that causes a column to be missing for an extended
statistics, that it would be pretty surprising if we rebuild the
extended stats.
That would be very surprising indeed.
Perhaps the SET STATISTIC 0 for a column still needs to gather data
for extended statistics, though. Perhaps a debate should ensue about
how that should work exactly.
Hmm. I'd rather throw a warning at either CREATE STATISTICS time or at
ALTER TABLE SET STATISTICS time, rather than magically changing the set
of columns at analyze time.
I've attached a patch which fixes the problem above, but it does
nothing to change the analyze behaviour for 0 statistics columns.
Pushed, after tweaking so that a WARNING is still emitted, because it's
likely to be useful in pointing out a procedural mistake while extended
stats are being tested.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18 April 2017 at 05:12, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Pushed, after tweaking so that a WARNING is still emitted, because it's
likely to be useful in pointing out a procedural mistake while extended
stats are being tested.
Thanks for pushing.
Seems you maintained most of my original patch and added to it a bit,
but credits don't seem to reflect that. It's not the end of the world
but just wanted to note that.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David Rowley wrote:
On 18 April 2017 at 05:12, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Pushed, after tweaking so that a WARNING is still emitted, because it's
likely to be useful in pointing out a procedural mistake while extended
stats are being tested.Thanks for pushing.
Seems you maintained most of my original patch and added to it a bit,
but credits don't seem to reflect that. It's not the end of the world
but just wanted to note that.
Yeah, sorry about that. Maybe we should start mentioning the percentage
of the patch that comes from each author (just kidding).
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers