Reducing header interdependencies around heapam.h et al.

Started by Andres Freundover 7 years ago11 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

While working on pluggable storage (in particular, while cleaning it up
over the last few days), I grew concerned with widely heapam.h is
included in other headers. E.g. the executor (via execnodes.h,
executor.h relying on heapam.h) shouldn't depend on heapam.h details -
particularly after pluggable storage, but also before. To me that's
unnecessary leakage across abstraction boundaries.

In the attached patch I excised all heapam.h includes from other
headers. There were basically two things required to do so:

* In a few places that use HeapScanDesc (which confusingly is a typedef
in heapam.h, but the underlying struct is in relscan.h) etc, we can
easily get by just using struct HeapScanDescData * instead.

* I moved the LockTupleMode enum to to lockoptions.h - previously
executor.h tried to avoid relying on heapam.h, but it ended up
including heapam.h indirectly, which lead to a couple people
introducing new uses of the enum. We could just convert those to
ints like in other places, but I think moving to a smaller header
seems more appropriate. I don't think lockoptions.h is perfect, but
it's also not bad?

This required adding heapam.h includes to a bunch of places, but that
doesn't seem too bad. It'll possibly break a few external users, but I
think that's acceptable cost - many of those will already/will further
be broken in 12 anyway.

I think we should do the same with genam.h, but that seems better done
separately.

I've a pending local set of patches that splits relation_open/close,
heap_open/close et al into a separate set of includes, that then allows
to downgrade the heapam.h include to that new file (given that a large
percentage of the files really just want heap_open/close and nothing
else from heapam.h), which I'll rebase ontop of this if we can agree
that this change is a good idea.

Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
being in different files (in a3540b0f657c6352) - what do you think about
this change? I didn't revert that split, but I think we ought to
consider just relying on a forward declared struct in heapam.h instead,
this split is pretty confusing and seems to lead to more interdependence
in a lot of cases.

Greetings,

Andres Freund

Attachments:

0001-Don-t-include-heapam.h-from-others-headers.patchtext/x-diff; charset=us-asciiDownload+92-39
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#1)
Re: Reducing header interdependencies around heapam.h et al.

On 2019-Jan-13, Andres Freund wrote:

Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
being in different files (in a3540b0f657c6352) - what do you think about
this change? I didn't revert that split, but I think we ought to
consider just relying on a forward declared struct in heapam.h instead,
this split is pretty confusing and seems to lead to more interdependence
in a lot of cases.

I wasn't terribly happy with that split, so I'm not opposed to doing
things differently. For your consideration, I've had this patch lying
around for a few years, which (IIRC) reduces the exposure of heapam.h by
splitting relscan.h in two. This applies on top of dd778e9d8884 (and as
I recall it worked well there).

I'll try to have a look at your patch tomorrow.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#2)
Re: Reducing header interdependencies around heapam.h et al.

Hi,

On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:

On 2019-Jan-13, Andres Freund wrote:

Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
being in different files (in a3540b0f657c6352) - what do you think about
this change? I didn't revert that split, but I think we ought to
consider just relying on a forward declared struct in heapam.h instead,
this split is pretty confusing and seems to lead to more interdependence
in a lot of cases.

I wasn't terribly happy with that split, so I'm not opposed to doing
things differently. For your consideration, I've had this patch lying
around for a few years, which (IIRC) reduces the exposure of heapam.h by
splitting relscan.h in two. This applies on top of dd778e9d8884 (and as
I recall it worked well there).

You forgot to attach that patch... :).

I'm not sure I see a need to split relscan - note my patch makes it so
that it's not included by heapam.h anymore, and doing for the same for
genam.h would be fairly straightforward. The most interesting bit there
would be whether we'd add the includes necessary for Snapshot (imo no),
Relation (?), ScanKey (imo no), or whether to add the necessary includes
directly.

I'll try to have a look at your patch tomorrow.

Thanks!

Greetings,

Andres Freund

#4Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#3)
Re: Reducing header interdependencies around heapam.h et al.

On 2019-01-13 19:05:03 -0800, Andres Freund wrote:

Hi,

On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:

On 2019-Jan-13, Andres Freund wrote:

Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
being in different files (in a3540b0f657c6352) - what do you think about
this change? I didn't revert that split, but I think we ought to
consider just relying on a forward declared struct in heapam.h instead,
this split is pretty confusing and seems to lead to more interdependence
in a lot of cases.

I wasn't terribly happy with that split, so I'm not opposed to doing
things differently. For your consideration, I've had this patch lying
around for a few years, which (IIRC) reduces the exposure of heapam.h by
splitting relscan.h in two. This applies on top of dd778e9d8884 (and as
I recall it worked well there).

You forgot to attach that patch... :).

I'm not sure I see a need to split relscan

One split I am wondering about however is splitting out the sysstable_
stuff out of genam.h. It's imo a different component and shouldn't
really be in there. Would be quite a bit of rote work to add all the
necessary includes over the backend...

Greetings,

Andres Freund

#5Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#3)
Re: Reducing header interdependencies around heapam.h et al.

On 2019-01-13 19:05:03 -0800, Andres Freund wrote:

Hi,

On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:

On 2019-Jan-13, Andres Freund wrote:

Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
being in different files (in a3540b0f657c6352) - what do you think about
this change? I didn't revert that split, but I think we ought to
consider just relying on a forward declared struct in heapam.h instead,
this split is pretty confusing and seems to lead to more interdependence
in a lot of cases.

I wasn't terribly happy with that split, so I'm not opposed to doing
things differently. For your consideration, I've had this patch lying
around for a few years, which (IIRC) reduces the exposure of heapam.h by
splitting relscan.h in two. This applies on top of dd778e9d8884 (and as
I recall it worked well there).

You forgot to attach that patch... :).

I'm not sure I see a need to split relscan - note my patch makes it so
that it's not included by heapam.h anymore, and doing for the same for
genam.h would be fairly straightforward. The most interesting bit there
would be whether we'd add the includes necessary for Snapshot (imo no),
Relation (?), ScanKey (imo no), or whether to add the necessary includes
directly.

Here's a patch doing the same for genam as well.

Greetings,

Andres Freund

Attachments:

v2-0001-Don-t-include-heapam.h-from-others-headers.patchtext/x-diff; charset=us-asciiDownload+92-39
v2-0002-Make-naming-of-tupdesc-related-structs-more-consi.patchtext/x-diff; charset=us-asciiDownload+14-14
v2-0003-Don-t-include-genam.h-from-execnodes.h-anymore.patchtext/x-diff; charset=us-asciiDownload+45-24
#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#3)
Re: Reducing header interdependencies around heapam.h et al.

On 2019-Jan-13, Andres Freund wrote:

On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:

On 2019-Jan-13, Andres Freund wrote:

Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
being in different files (in a3540b0f657c6352) - what do you think about
this change? I didn't revert that split, but I think we ought to
consider just relying on a forward declared struct in heapam.h instead,
this split is pretty confusing and seems to lead to more interdependence
in a lot of cases.

I wasn't terribly happy with that split, so I'm not opposed to doing
things differently. For your consideration, I've had this patch lying
around for a few years, which (IIRC) reduces the exposure of heapam.h by
splitting relscan.h in two. This applies on top of dd778e9d8884 (and as
I recall it worked well there).

You forgot to attach that patch... :).

Oops :-( Here it is anyway. Notmuch reminded me that I had posted this
before, to a pretty cold reception:
/messages/by-id/20130917020228.GB7139@eldon.alvh.no-ip.org
Needless to say, I disagree with the general sentiment in that thread
that header refactoring is pointless and unwelcome.

I'm not sure I see a need to split relscan - note my patch makes it so
that it's not included by heapam.h anymore, and doing for the same for
genam.h would be fairly straightforward. The most interesting bit there
would be whether we'd add the includes necessary for Snapshot (imo no),
Relation (?), ScanKey (imo no), or whether to add the necessary includes
directly.

Ah, you managed to get heapam.h and genam.h out of execnodes.h, which I
think was my main motivation ... that seems good enough to me. I agree
that splitting relscan.h may not be necessary after these changes.

As for struct Relation, note that for that one you only need relcache.h
which should be lean enough, so it doesn't sound too bad to include that
one wherever.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

relscan_details.patchtext/x-diff; charset=us-asciiDownload+198-124
#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#4)
Re: Reducing header interdependencies around heapam.h et al.

On 2019-Jan-13, Andres Freund wrote:

One split I am wondering about however is splitting out the sysstable_
stuff out of genam.h. It's imo a different component and shouldn't
really be in there. Would be quite a bit of rote work to add all the
necessary includes over the backend...

Yeah -- unless there's a demonstrable win from this split, I would leave
this part alone, unless you regularly carry a shield to PG conferences.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#5)
Re: Reducing header interdependencies around heapam.h et al.

0001 -- looks reasonable. One hunk in executor.h changes LockTupleMode
to "enum LockTupleMode", but there's no need for that.

AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr)
split is so that it's possible to refer to structs without having
the full struct definition. I think changing uses of FooBar to "struct
FooBarData *" defeats the whole purpose -- it becomes pointless noise,
confusing the reader for no gain. I've long considered that the struct
definitions should appear in "internal" headers (such as
htup_details.h), separate from the pointer typedefs, so that it is the
forward struct declarations (and the pointer typedefs, where there are
any) that are part of the exposed API for each module, and not the
struct definitions.

I think that would be much more invasive, though, and it's unlikely to
succeed as easily as this simpler approach is.

I think MissingPtr is a terrible name. Can we change that while at this?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#8)
Re: Reducing header interdependencies around heapam.h et al.

Hi,

On 2019-01-14 15:36:14 -0300, Alvaro Herrera wrote:

0001 -- looks reasonable. One hunk in executor.h changes LockTupleMode
to "enum LockTupleMode", but there's no need for that.

Oh, that escaped from an earlier version where I briefly forgot that one
cannot portably forward-declare enums.

AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr)
split is so that it's possible to refer to structs without having
the full struct definition. I think changing uses of FooBar to "struct
FooBarData *" defeats the whole purpose -- it becomes pointless noise,
confusing the reader for no gain. I've long considered that the struct
definitions should appear in "internal" headers (such as
htup_details.h), separate from the pointer typedefs, so that it is the
forward struct declarations (and the pointer typedefs, where there are
any) that are part of the exposed API for each module, and not the
struct definitions.

I think the whole pointer hiding game that we play is a really really
bad idea. We should just about *never* have typedefs that hide the fact
that something is a pointer. But it's hard to go away from that for
legacy reasons.

The problem with your approach is that it's *eminently* reasonable to
want to forward declare a struct in multiple places. Otherwise you end
up in issues where you include headers like heapam.h solely for a
typedef, which obviously doesn't make a ton of sense.

If we were in C11 we could just forward declare the pointer hiding
typedefs in multiple places, and be done with that. But before C11
redundant typedefs aren't allowed. With the C99 move I'm however not
sure if there's any surviving supported compiler that doesn't allow
redundant typedefs as an extension.

Given the fact that including headers just for a typedef is frequent
overkill, hiding the typedef in a separate header has basically no
gain. I also don't quite understand why using a forward declaration with
struct in the name is that confusing, especially when it only happens in
the header.

I think MissingPtr is a terrible name. Can we change that while at
this?

Indeed. I'd just remove the typedef.

Greetings,

Andres Freund

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#9)
Re: Reducing header interdependencies around heapam.h et al.

On 2019-Jan-14, Andres Freund wrote:

I think the whole pointer hiding game that we play is a really really
bad idea. We should just about *never* have typedefs that hide the fact
that something is a pointer. But it's hard to go away from that for
legacy reasons.

The problem with your approach is that it's *eminently* reasonable to
want to forward declare a struct in multiple places. Otherwise you end
up in issues where you include headers like heapam.h solely for a
typedef, which obviously doesn't make a ton of sense.

Well, my point is that in an ideal world we would have a header where
the struct is declared once in a very lean header, which doesn't include
almost anything else, so you can include it into other headers
liberally. Then the struct definitions are any another (heavy) header,
which *does* need to include lots of other stuff in order to be able to
define the structs fully, and would be #included very sparingly, only in
the few .c files that really needed it.

For example, I would split up execnodes.h so that *only* the
typedef/struct declarations are there, and *no* struct definition. Then
that header can be #included in other headers that need those to declare
functions -- no problem. Another header (say execstructs.h, whatever)
would contain the struct definition and would only be used by executor
.c files. So when a struct changes, only the executor is recompiled;
the rest of the source doesn't care, because execnodes.h (the struct
decls) hasn't changed.

This may be too disruptive. I'm not suggesting that you do things this
way, only describing my ideal world.

Your idea of "liberally forward-declaring structs in multiple places"
seems like you don't *have* the lean header at all (only the heavy one
with all the struct definitions), and instead you distribute bits and
pieces of the lean header randomly to the places that need it. It's
repetitive. It gets the job done, but it's not *clean*.

Given the fact that including headers just for a typedef is frequent
overkill, hiding the typedef in a separate header has basically no
gain. I also don't quite understand why using a forward declaration with
struct in the name is that confusing, especially when it only happens in
the header.

Oh, that's not the confusing part -- that's just repetitive, nothing
more. What's confusing (IMO) is having two names for the same struct
(one pointer and one full struct). It'd be useful if it was used the
way I describe above. But that's the settled project style, so I don't
have any hopes that it'll ever be changed.

Anyway, I'm not objecting to your patch ... I just don't want it on
record that I'm in love with the current situation :-)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#10)
Re: Reducing header interdependencies around heapam.h et al.

Hi,

On 2019-01-14 17:55:44 -0300, Alvaro Herrera wrote:

On 2019-Jan-14, Andres Freund wrote:

I think the whole pointer hiding game that we play is a really really
bad idea. We should just about *never* have typedefs that hide the fact
that something is a pointer. But it's hard to go away from that for
legacy reasons.

The problem with your approach is that it's *eminently* reasonable to
want to forward declare a struct in multiple places. Otherwise you end
up in issues where you include headers like heapam.h solely for a
typedef, which obviously doesn't make a ton of sense.

Well, my point is that in an ideal world we would have a header where
the struct is declared once in a very lean header, which doesn't include
almost anything else, so you can include it into other headers
liberally. Then the struct definitions are any another (heavy) header,
which *does* need to include lots of other stuff in order to be able to
define the structs fully, and would be #included very sparingly, only in
the few .c files that really needed it.

For example, I would split up execnodes.h so that *only* the
typedef/struct declarations are there, and *no* struct definition. Then
that header can be #included in other headers that need those to declare
functions -- no problem. Another header (say execstructs.h, whatever)
would contain the struct definition and would only be used by executor
.c files. So when a struct changes, only the executor is recompiled;
the rest of the source doesn't care, because execnodes.h (the struct
decls) hasn't changed.

It's surely better than the current state, but it still requires
recompiling everything in a more cases than necessary.

This may be too disruptive. I'm not suggesting that you do things this
way, only describing my ideal world.

It'd probably doable by leaving execnodes.h as the heavyweight nodes,
and execnodetypes.h as the lightweight one, and including the latter
from the former. And then moving users of execnodes over to
execnodetypes.

Your idea of "liberally forward-declaring structs in multiple places"
seems like you don't *have* the lean header at all (only the heavy one
with all the struct definitions), and instead you distribute bits and
pieces of the lean header randomly to the places that need it. It's
repetitive. It gets the job done, but it's not *clean*.

I'm not really buying the repetitiveness bit - it's really primarily
adding 'struct ' prefix, and sometimes adding a 'Data *' postfix. That's
not a lot of duplication. When used in structs there's no need to even
add an explicit 'struct <name>;' forward declaration, that's only needed
for function parameters. And afterwards there's a lot less entanglement
- no need to recompile every file just because a new node type has been
added etc.

Given the fact that including headers just for a typedef is frequent
overkill, hiding the typedef in a separate header has basically no
gain. I also don't quite understand why using a forward declaration with
struct in the name is that confusing, especially when it only happens in
the header.

Oh, that's not the confusing part -- that's just repetitive, nothing
more. What's confusing (IMO) is having two names for the same struct
(one pointer and one full struct). It'd be useful if it was used the
way I describe above. But that's the settled project style, so I don't
have any hopes that it'll ever be changed.

Not within a few days, but we probably can do it over time...

Anyway, I'm not objecting to your patch ... I just don't want it on
record that I'm in love with the current situation :-)

Cool, I've pushed these now. I'll rebase my patch to split
(heap|reation)_(open|close)(rv)? patch out of heapam.[ch] now. Brr.

Greetings,

Andres Freund