Towards easier AMs: Cleaning up inappropriate use of name "relkind"
Hackers,
The name "relkind" normally refers to a field of type 'char' with values like 'r' for "table" and 'i' for "index". In AlterTableStmt and CreateTableAsStmt, this naming convention was abused for a field of type enum ObjectType. Often, such fields are named "objtype", though also "kind", "removeType", "renameType", etc.
I don't care to debate those other names, though in passing I'll say that "kind" seems not great. The "relkind" name is particularly bad, though. It is confusing in functions that also operate on a RangeTblEntry object, which also has a field named "relkind", and is confusing in light of the function get_relkind_objtype() which maps from "relkind" to "objtype", implying quite correctly that those two things are distinct.
The attached patch cleans this up. How many toes am I stepping on here? Changing the names was straightforward and doesn't seem to cause any issues with 'make check-world'. Any objection?
For those interested in the larger context of this patch, I am trying to clean up any part of the code that makes it harder to write and test new access methods. When implementing a new AM, one currently needs to `grep -i relkind` to find a long list of files that need special treatment. One then needs to consider whether special logic for the new AM needs to be inserted into all these spots. As such, it is nice if these spots have as little confusing naming as possible. This patch makes that process a little easier. I have another patch (to be posted shortly) that cleans up the #define RELKIND_XXX stuff using a new RelKind enum and special macros while keeping the relkind fields as type 'char'. Along with converting code to use switch(relkind) rather than if (relkind...) statements, the compiler now warns on unhandled cases when you add a new RelKind to the list, making it easier to find all the places you need to update. I decided to keep that work independent of this patch, as the code is logically distinct.
Attachments:
v1-0001-Renaming-relkind-as-objtype-where-appropriate.patchapplication/octet-stream; name=v1-0001-Renaming-relkind-as-objtype-where-appropriate.patch; x-unix-mode=0644Download+37-38
On 2020-Jun-03, Mark Dilger wrote:
The name "relkind" normally refers to a field of type 'char' with
values like 'r' for "table" and 'i' for "index". In AlterTableStmt
and CreateTableAsStmt, this naming convention was abused for a field
of type enum ObjectType.
I agree that "relkind" here is a misnomer, and I bet that what happened
here is that the original patch Gavin developed was using the relkind
enum from pg_class and was later changed to the OBJECT_ defines after
patch review, but the struct member name remained. I don't object to
the proposed renaming.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3 Jun 2020, at 19:05, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The attached patch cleans this up.
The gram.y hunks in this patch no longer applies, please submit a rebased
version. I'm marking the entry Waiting on Author in the meantime.
cheers ./daniel
On Jul 1, 2020, at 2:45 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
On 3 Jun 2020, at 19:05, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The attached patch cleans this up.
The gram.y hunks in this patch no longer applies, please submit a rebased
version. I'm marking the entry Waiting on Author in the meantime.
Rebased patch attached. Thanks for mentioning it!
Attachments:
v2-0001-Renaming-relkind-as-objtype-where-appropriate.patchapplication/octet-stream; name=v2-0001-Renaming-relkind-as-objtype-where-appropriate.patch; x-unix-mode=0644Download+37-38
On Jun 3, 2020, at 10:05 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I have another patch (to be posted shortly) that cleans up the #define RELKIND_XXX stuff using a new RelKind enum and special macros while keeping the relkind fields as type 'char'. Along with converting code to use switch(relkind) rather than if (relkind...) statements, the compiler now warns on unhandled cases when you add a new RelKind to the list, making it easier to find all the places you need to update. I decided to keep that work independent of this patch, as the code is logically distinct.
Most of the work in this patch is mechanical replacement of if/else if/else statements which hinge on relkind to switch statements on relkind. The patch is not philosophically very interesting, but it is fairly long. Reviewers might start by scrolling down the patch to the changes in src/include/catalog/pg_class.h
There are no intentional behavioral changes in this patch.
Attachments:
v2-0002-Refactoring-relkind-handling.patchapplication/octet-stream; name=v2-0002-Refactoring-relkind-handling.patch; x-unix-mode=0644Download+8280-4157
On Wed, Jul 01, 2020 at 09:46:34AM -0700, Mark Dilger wrote:
Rebased patch attached. Thanks for mentioning it!
There are two patches on this thread v2-0001 being much smaller than
v2-0002. I have looked at 0001 for now, and, like Alvaro, this
renaming makes sense to me. Those commands work on objects that are
relkinds, except for one OBJECT_TYPE. So, let's get 0001 patch
merged. Any objections from others?
--
Michael
On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote:
There are two patches on this thread v2-0001 being much smaller than
v2-0002. I have looked at 0001 for now, and, like Alvaro, this
renaming makes sense to me. Those commands work on objects that are
relkinds, except for one OBJECT_TYPE. So, let's get 0001 patch
merged. Any objections from others?
I have been through this one again and applied it as cc35d89.
--
Michael
On Wed, Jul 01, 2020 at 05:04:19PM -0700, Mark Dilger wrote:
Most of the work in this patch is mechanical replacement of if/else
if/else statements which hinge on relkind to switch statements on
relkind. The patch is not philosophically very interesting, but it
is fairly long. Reviewers might start by scrolling down the patch
to the changes in src/include/catalog/pg_class.hThere are no intentional behavioral changes in this patch.
Please note that 0002 does not apply anymore, there are conflicts in
heap.c and tablecmds.c, and that there are noise diffs, likely because
you ran pgindent and included places unrelated to this patch. Anyway,
I am not really a fan of this patch. I could see a benefit in
switching to an enum so as for places where we use a switch/case
without a default we would be warned if a new relkind gets added or if
a value is not covered, but then we should not really need
RELKIND_NULL, no?
--
Michael
On Jul 10, 2020, at 9:44 PM, Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 08, 2020 at 10:00:47PM +0900, Michael Paquier wrote:
There are two patches on this thread v2-0001 being much smaller than
v2-0002. I have looked at 0001 for now, and, like Alvaro, this
renaming makes sense to me. Those commands work on objects that are
relkinds, except for one OBJECT_TYPE. So, let's get 0001 patch
merged. Any objections from others?I have been through this one again and applied it as cc35d89.
--
Michael
Thanks for committing!
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 10, 2020, at 11:00 PM, Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 01, 2020 at 05:04:19PM -0700, Mark Dilger wrote:
Most of the work in this patch is mechanical replacement of if/else
if/else statements which hinge on relkind to switch statements on
relkind. The patch is not philosophically very interesting, but it
is fairly long. Reviewers might start by scrolling down the patch
to the changes in src/include/catalog/pg_class.hThere are no intentional behavioral changes in this patch.
Please note that 0002 does not apply anymore, there are conflicts in
heap.c and tablecmds.c, and that there are noise diffs, likely because
you ran pgindent and included places unrelated to this patch.
I can resubmit, but should like to address your second point before bothering...
Anyway,
I am not really a fan of this patch. I could see a benefit in
switching to an enum so as for places where we use a switch/case
without a default we would be warned if a new relkind gets added or if
a value is not covered, but then we should not really need
RELKIND_NULL, no?
There are code paths where relkind is sometimes '\0' under normal, non-exceptional conditions. This happens in
allpaths.c: set_append_rel_size
rewriteHandler.c: view_query_is_auto_updatable
lockcmds.c: LockViewRecurse_walker
pg_depend.c: getOwnedSequences_internal
Doesn't this justify having RELKIND_NULL in the enum?
It is not the purpose of this patch to change the behavior of the code. This is just a structural patch, using an enum and switches rather than char and if/else if/else blocks.
Subsequent patches could build on this work, such as changing the behavior when code encounters a relkind value outside the code's expected set of relkind values. Whether those patches would add Assert()s, elog()s, or ereport()s is not something I'd like to have to debate as part of this patch submission. Assert()s have the advantage of costing nothing in production builds, but elog()s have the advantage of protecting against corrupt relkind values at runtime in production.
Getting the compiler to warn when a new relkind is added to the enumeration but not handled in a switch is difficult. One strategy is to add -Wswitch-enum, but that would require refactoring switches over all enums, not just over the RelKind enum, and for some enums, that would require a large number of extra lines to be added to the code. Another strategy is to remove the default label from switches over RelKind, but that removes protections against invalid relkinds being encountered.
Do you have a preference about which directions I should pursue? Or do you think the patch idea itself is dead?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes:
On Jul 10, 2020, at 11:00 PM, Michael Paquier <michael@paquier.xyz> wrote:
I am not really a fan of this patch. I could see a benefit in
switching to an enum so as for places where we use a switch/case
without a default we would be warned if a new relkind gets added or if
a value is not covered, but then we should not really need
RELKIND_NULL, no?
There are code paths where relkind is sometimes '\0' under normal, non-exceptional conditions. This happens in
allpaths.c: set_append_rel_size
rewriteHandler.c: view_query_is_auto_updatable
lockcmds.c: LockViewRecurse_walker
pg_depend.c: getOwnedSequences_internal
Doesn't this justify having RELKIND_NULL in the enum?
I'd say no. I think including an intentionally invalid value in such
an enum is horrid, mainly because it will force a lot of places to cover
that value when they shouldn't (or else draw "enum value not handled in
switch" warnings). The confusion factor about whether it maybe *is*
a valid value is not to be discounted, either.
If we can't readily get rid of the use of '\0' in these code paths,
maybe trying to convert to an enum isn't going to be a win after all.
Getting the compiler to warn when a new relkind is added to the
enumeration but not handled in a switch is difficult.
We already have a project policy about how to do that.
regards, tom lane
On Sat, Jul 11, 2020 at 03:32:55PM -0400, Tom Lane wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
There are code paths where relkind is sometimes '\0' under normal,
non-exceptional conditions. This happens inallpaths.c: set_append_rel_size
rewriteHandler.c: view_query_is_auto_updatable
lockcmds.c: LockViewRecurse_walker
pg_depend.c: getOwnedSequences_internal
There are more code paths than what's mentioned upthread when it comes
to relkinds and \0. For example, I can quickly grep for acl.c that
relies on get_rel_relkind() returning \0 when the relkind cannot be
found. And we do that for get_typtype() as well in the syscache.
Doesn't this justify having RELKIND_NULL in the enum?
I'd say no. I think including an intentionally invalid value in such
an enum is horrid, mainly because it will force a lot of places to cover
that value when they shouldn't (or else draw "enum value not handled in
switch" warnings). The confusion factor about whether it maybe *is*
a valid value is not to be discounted, either.
I agree here that the situation could be improved because we never
store this value in the catalogs. Perhaps there would be a benefit in
switching to an enum in the long run, I am not sure. But if we do so,
RELKIND_NULL should not be around.
--
Michael
On Jul 12, 2020, at 4:59 AM, Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Jul 11, 2020 at 03:32:55PM -0400, Tom Lane wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
There are code paths where relkind is sometimes '\0' under normal,
non-exceptional conditions. This happens inallpaths.c: set_append_rel_size
rewriteHandler.c: view_query_is_auto_updatable
lockcmds.c: LockViewRecurse_walker
pg_depend.c: getOwnedSequences_internalThere are more code paths than what's mentioned upthread when it comes
to relkinds and \0. For example, I can quickly grep for acl.c that
relies on get_rel_relkind() returning \0 when the relkind cannot be
found. And we do that for get_typtype() as well in the syscache.
I was thinking about places in the code that test a relkind variable against a list of values, rather than places that return a relkind to callers, though certainly those two things are related. It's possible that I've missed some places in the code where \0 might be encountered, but I've added Asserts against unexpected values in v3.
I left get_rel_relkind() as is. There does not seem to be anything wrong with it returning \0 as long as all callers are prepared to deal with that result.
Doesn't this justify having RELKIND_NULL in the enum?
I'd say no. I think including an intentionally invalid value in such
an enum is horrid, mainly because it will force a lot of places to cover
that value when they shouldn't (or else draw "enum value not handled in
switch" warnings). The confusion factor about whether it maybe *is*
a valid value is not to be discounted, either.I agree here that the situation could be improved because we never
store this value in the catalogs. Perhaps there would be a benefit in
switching to an enum in the long run, I am not sure. But if we do so,
RELKIND_NULL should not be around.
In the v3 patch, I have removed RELKIND_NULL from the enum, and also removed default: labels from switches over RelKind. The patch is also rebased.
Attachments:
v3-0001-Refactoring-relkind-handling.patchapplication/octet-stream; name=v3-0001-Refactoring-relkind-handling.patch; x-unix-mode=0644Download+8020-4324
This patch is way too large. Probably in part explained by the fact
that you seem to have run pgindent and absorbed a lot of unrelated
changes. This makes the patch essentially unreviewable.
I think you should define a RelationGetRelkind() static function that
returns the appropriate datatype without requiring a cast and assert in
every single place that processes a relation's relkind. Similarly
you've chosen to leave get_rel_relkind untouched, but that seems unwise.
I think the chr_ macros are pointless.
Reading back the thread, it seems that the whole point of your patch was
to change the tests that currently use 'if' tests to switch blocks. I
cannot understand what's the motivation for that, but it appears to me
that the approach is backwards: I'd counsel to *first* change the APIs
(get_rel_relkind and defining an enum, plus adding RelationGetRelkind)
so that everything is more sensible and safe, including appropriate
answers for the places where an "invalid" relkind is returned; and once
that's in place, replace if tests with switch blocks where it makes
sense to do so.
Also, I suggest that this thread is not a good one for this patch.
Subject is entirely not appropriate. Open a new thread perhaps?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Jul 14, 2020, at 4:12 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
This patch is way too large. Probably in part explained by the fact
that you seem to have run pgindent and absorbed a lot of unrelated
changes. This makes the patch essentially unreviewable.
I did not run pgindent, but when changing
if (relkind == RELKIND_INDEX)
{
/* foo */
}
to
switch (relkind)
{
case RELKIND_INDEX:
/* foo */
}
the indentation of /* foo */ changes. For large foo, that results in a lot of lines. There are also cases in the code where comparisons of multiple variables are mixed together. To split those out into switch/case statements I had to rearrange some of the code blocks.
I think you should define a RelationGetRelkind() static function that
returns the appropriate datatype without requiring a cast and assert in
every single place that processes a relation's relkind. Similarly
you've chosen to leave get_rel_relkind untouched, but that seems unwise.
I was well aware of how large the patch had gotten, and didn't want to add more....
I think the chr_ macros are pointless.
Look more closely at the #define RelKindAsString(x) CppAsString2(chr_##x)
Reading back the thread, it seems that the whole point of your patch was
to change the tests that currently use 'if' tests to switch blocks. I
cannot understand what's the motivation for that,
There might not be sufficient motivation to make the patch worth doing. The motivation was to leverage the project's recent addition of -Wswitch to make it easier to know which code needs updating when you add a new relkind. That doesn't happen very often, but I was working towards that kind of thing, and thought this might be a good prerequisite patch for that work. Stylistically, I also prefer
+ switch ((RelKind) rel->rd_rel->relkind)
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ case RELKIND_TOASTVALUE:
over
- if (rel->rd_rel->relkind == RELKIND_RELATION ||
- rel->rd_rel->relkind == RELKIND_MATVIEW ||
- rel->rd_rel->relkind == RELKIND_TOASTVALUE)
which is a somewhat common pattern in the code. It takes less mental effort to see that only one variable is being compared against those three enum values. In some cases, though not necessarily this exact example, it also *might* save duplicated work computing the variable, depending on the situation and what the compiler can optimize away.
but it appears to me
that the approach is backwards: I'd counsel to *first* change the APIs
(get_rel_relkind and defining an enum, plus adding RelationGetRelkind)
so that everything is more sensible and safe, including appropriate
answers for the places where an "invalid" relkind is returned;
Ok.
and once
that's in place, replace if tests with switch blocks where it makes
sense to do so.
Ok.
Also, I suggest that this thread is not a good one for this patch.
Subject is entirely not appropriate. Open a new thread perhaps?
I've changed the subject line.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company