remove_useless_groupby_columns does not need to record constraint dependencies

Started by David Rowleyover 5 years ago5 messages
#1David Rowley
dgrowleyml@gmail.com
1 attachment(s)

Hi,

Over in [1]/messages/by-id/CAApHDvr4OW_OUd_Rxp0d1hRgz+a4mm8+8uR7QoM2VqKFX08SqA@mail.gmail.com, Tom and I had a discussion in response to some confusion
about why remove_useless_groupby_columns() goes to the trouble of
recording a dependency on the PRIMARY KEY constraint when removing
surplus columns from the GROUP BY clause.

The outcome was that we don't need to do this since
remove_useless_groupby_columns() is used only as a plan-time
optimisation, we don't need to record any dependency. Unlike
check_functional_grouping(), where we must record the dependency as we
may end up with a VIEW with columns, e.g, in the select list which are
functionally dependant on a pkey constraint. In that case, we must
ensure the view is also removed, or that the constraint removal is
blocked. There's no such requirement for planner smarts, such as the
one in remove_useless_groupby_columns() as in that case we'll trigger
a relcache invalidation during ALTER TABLE DROP CONSTRAINT, which
cached plans will notice when they obtain their locks just before
execution begins.

To prevent future confusion, I'd like to remove dependency recording
code from remove_useless_groupby_columns() and update the misleading
comment. Likely this should also be backpatched to 9.6.

Does anyone think differently?

A patch to do this is attached.

[1]: /messages/by-id/CAApHDvr4OW_OUd_Rxp0d1hRgz+a4mm8+8uR7QoM2VqKFX08SqA@mail.gmail.com

Attachments:

fix_remove_useless_groupby_columns.patchapplication/octet-stream; name=fix_remove_useless_groupby_columns.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b31c52404a..bb4c10b193 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3056,14 +3056,20 @@ limit_needed(Query *parse)
  * Since some other DBMSes do not allow references to ungrouped columns, it's
  * not unusual to find all columns listed in GROUP BY even though listing the
  * primary-key columns would be sufficient.  Deleting such excess columns
- * avoids redundant sorting work, so it's worth doing.  When we do this, we
- * must mark the plan as dependent on the pkey constraint (compare the
- * parser's check_ungrouped_columns() and check_functional_grouping()).
+ * avoids redundant sorting work, so it's worth doing.
  *
- * In principle, we could treat any NOT-NULL columns appearing in a UNIQUE
- * index as the determining columns.  But as with check_functional_grouping(),
- * there's currently no way to represent dependency on a NOT NULL constraint,
- * so we consider only the pkey for now.
+ * Currently we only make use of pkey constraints for this, however, we may
+ * wish to take this further in the future and also use unique constraints
+ * which have NOT NULL columns.  In the past, there had been some
+ * misconception that we could not do this due to lack of a way to invalidate
+ * plans when NOT NULL constraints are dropped.  Such plans will be
+ * invalidated through relcache invalidation during the ALTER TABLE and
+ * subsequent invalidation checks which are done when we're about to execute
+ * these plans during the call to AcquireExecutorLocks().
+ *
+ * This should not be confused with check_functional_grouping(), where we must
+ * record dependencies to ensure objects such as VIEWs do not become invalid
+ * due to a constraint being dropped.
  */
 static void
 remove_useless_groupby_columns(PlannerInfo *root)
@@ -3172,10 +3178,6 @@ remove_useless_groupby_columns(PlannerInfo *root)
 
 			/* Remember the attnos of the removable columns */
 			surplusvars[relid] = bms_difference(relattnos, pkattnos);
-
-			/* Also, mark the resulting plan as dependent on this constraint */
-			parse->constraintDeps = lappend_oid(parse->constraintDeps,
-												constraintOid);
 		}
 	}
 
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#1)
Re: remove_useless_groupby_columns does not need to record constraint dependencies

David Rowley <dgrowleyml@gmail.com> writes:

Over in [1], Tom and I had a discussion in response to some confusion
about why remove_useless_groupby_columns() goes to the trouble of
recording a dependency on the PRIMARY KEY constraint when removing
surplus columns from the GROUP BY clause.

The outcome was that we don't need to do this since
remove_useless_groupby_columns() is used only as a plan-time
optimisation, we don't need to record any dependency.

Right. I think it would be good for the comments to emphasize that
a relcache inval will be forced if the *index* underlying the pkey
constraint is dropped; the code doesn't care so much about the constraint
as such. (This is also why it'd be safe to use a plain unique index
for the same optimization, assuming you can independently verify
non-nullness of the columns. Maybe we should trash the existing coding
and just have it look for unique indexes + attnotnull flags.)

To prevent future confusion, I'd like to remove dependency recording
code from remove_useless_groupby_columns() and update the misleading
comment. Likely this should also be backpatched to 9.6.

+1 for removing the dependency and improving the comments in HEAD.
Minus quite a lot for back-patching: this is not a bug fix, and
there's a nonzero risk that we've overlooked something. I'd rather
find that out in beta testing than from bug reports against stable
branches.

regards, tom lane

#3David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#2)
1 attachment(s)
Re: remove_useless_groupby_columns does not need to record constraint dependencies

On Thu, 16 Apr 2020 at 03:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

Over in [1], Tom and I had a discussion in response to some confusion
about why remove_useless_groupby_columns() goes to the trouble of
recording a dependency on the PRIMARY KEY constraint when removing
surplus columns from the GROUP BY clause.

The outcome was that we don't need to do this since
remove_useless_groupby_columns() is used only as a plan-time
optimisation, we don't need to record any dependency.

Right. I think it would be good for the comments to emphasize that
a relcache inval will be forced if the *index* underlying the pkey
constraint is dropped; the code doesn't care so much about the constraint
as such. (This is also why it'd be safe to use a plain unique index
for the same optimization, assuming you can independently verify
non-nullness of the columns.

I've reworded the comment in the attached version.

Maybe we should trash the existing coding
and just have it look for unique indexes + attnotnull flags.)

I'd like to, but the timing seems off. Perhaps after we branch for PG14.

To prevent future confusion, I'd like to remove dependency recording
code from remove_useless_groupby_columns() and update the misleading
comment. Likely this should also be backpatched to 9.6.

+1 for removing the dependency and improving the comments in HEAD.
Minus quite a lot for back-patching: this is not a bug fix, and
there's a nonzero risk that we've overlooked something. I'd rather
find that out in beta testing than from bug reports against stable
branches.

That seems fair.

David

Attachments:

fix_remove_useless_groupby_columns2.patchapplication/octet-stream; name=fix_remove_useless_groupby_columns2.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b31c52404a..e664eb18c0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3056,14 +3056,16 @@ limit_needed(Query *parse)
  * Since some other DBMSes do not allow references to ungrouped columns, it's
  * not unusual to find all columns listed in GROUP BY even though listing the
  * primary-key columns would be sufficient.  Deleting such excess columns
- * avoids redundant sorting work, so it's worth doing.  When we do this, we
- * must mark the plan as dependent on the pkey constraint (compare the
- * parser's check_ungrouped_columns() and check_functional_grouping()).
+ * avoids redundant sorting work, so it's worth doing.
  *
- * In principle, we could treat any NOT-NULL columns appearing in a UNIQUE
- * index as the determining columns.  But as with check_functional_grouping(),
- * there's currently no way to represent dependency on a NOT NULL constraint,
- * so we consider only the pkey for now.
+ * Relcache invalidations will ensure that cached plans become invalidated
+ * when the underlying index of the pkey constraint is dropped.
+ *
+ * Currently, we only make use of pkey constraints for this, however, we may
+ * wish to take this further in the future and also use unique constraints
+ * which have NOT NULL columns.  In that case, plan invalidation will still
+ * work since relations will receive a relcache invalidation when a NOT NULL
+ * constraint is dropped.
  */
 static void
 remove_useless_groupby_columns(PlannerInfo *root)
@@ -3172,10 +3174,6 @@ remove_useless_groupby_columns(PlannerInfo *root)
 
 			/* Remember the attnos of the removable columns */
 			surplusvars[relid] = bms_difference(relattnos, pkattnos);
-
-			/* Also, mark the resulting plan as dependent on this constraint */
-			parse->constraintDeps = lappend_oid(parse->constraintDeps,
-												constraintOid);
 		}
 	}
 
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#3)
Re: remove_useless_groupby_columns does not need to record constraint dependencies

David Rowley <dgrowleyml@gmail.com> writes:

I've reworded the comment in the attached version.

LGTM.

regards, tom lane

#5David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#4)
Re: remove_useless_groupby_columns does not need to record constraint dependencies

On Fri, 17 Apr 2020 at 02:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've reworded the comment in the attached version.

LGTM.

Thanks for reviewing. Pushed.