split TOAST support out of postgres.h
Most backend code doesn't actually need the variable-length data types
support (TOAST support) in postgres.h. So I figured we could try to put
it into a separate header file. That makes postgres.h more manageable,
and it avoids including a bunch of complicated unused stuff everywhere.
I picked "varatt.h" as the name. Then we could either
1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That
way we clean up the files a bit but don't change any external interfaces.
2) Just let everyone who needs it include the new file.
3) Compromise: You can avoid most "damage" by having fmgr.h include
varatt.h. That satisfies most data types and extension code. That way,
there are only a few places that need an explicit include of varatt.h.
I went with the last option in my patch.
Thoughts?
Attachments:
0001-New-header-varatt.h-split-off-from-postgres.h.patchtext/plain; charset=UTF-8; name=0001-New-header-varatt.h-split-off-from-postgres.h.patchDownload+21-928
On Wed, 28 Dec 2022 at 08:07, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:
Most backend code doesn't actually need the variable-length data types
support (TOAST support) in postgres.h. So I figured we could try to put
it into a separate header file. That makes postgres.h more manageable,
and it avoids including a bunch of complicated unused stuff everywhere.
I picked "varatt.h" as the name. Then we could either
[…]
I went with the last option in my patch.
Thoughts?
This is a bit of a bikeshed suggestion, but I'm wondering if you considered
calling it toast.h? Only because the word is so distinctive within
Postgres; everybody knows exactly to what it refers.
I definitely agree with the principle of organizing and splitting up the
header files. Personally, I don't mind importing a bunch of headers if I'm
using a bunch of subsystems so I would be OK with needing to import this
new header if I need it.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
... Then we could either
1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That
way we clean up the files a bit but don't change any external interfaces.
2) Just let everyone who needs it include the new file.
3) Compromise: You can avoid most "damage" by having fmgr.h include
varatt.h. That satisfies most data types and extension code. That way,
there are only a few places that need an explicit include of varatt.h.
I went with the last option in my patch.
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?
regards, tom lane
Hi,
I've thought about this while working on Pluggable TOAST and 64-bit
TOAST value ID myself. Agree with #2 to seem the best of the above.
There are not so many places where a new header should be included.
On Wed, Dec 28, 2022 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
... Then we could either
1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That
way we clean up the files a bit but don't change any external interfaces.2) Just let everyone who needs it include the new file.
3) Compromise: You can avoid most "damage" by having fmgr.h include
varatt.h. That satisfies most data types and extension code. That way,
there are only a few places that need an explicit include of varatt.h.I went with the last option in my patch.
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?regards, tom lane
--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/
Hi,
On 2022-12-28 09:07:12 -0500, Isaac Morland wrote:
This is a bit of a bikeshed suggestion, but I'm wondering if you considered
calling it toast.h? Only because the word is so distinctive within
Postgres; everybody knows exactly to what it refers.
We have a bunch of toast*.h files already. The new header should pretty much
only contain the types, given how widely the header is going to be
included. So maybe toast_type.h?
Greetings,
Andres Freund
On Thu, 29 Dec 2022 at 18:16, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-12-28 09:07:12 -0500, Isaac Morland wrote:
This is a bit of a bikeshed suggestion, but I'm wondering if you considered
calling it toast.h? Only because the word is so distinctive within
Postgres; everybody knows exactly to what it refers.We have a bunch of toast*.h files already. The new header should pretty much
only contain the types, given how widely the header is going to be
included. So maybe toast_type.h?
My 2 cents: I don't think that toast_anything.h is appropriate,
because even though the varatt infrastructure does enable
externally-stored oversized attributes (which is the essence of
TOAST), this is not the only (or primary) use of the type.
Example: Indexes do not (can not?) support toasted values, but
generally do support variable length attributes that would be pulled
in with varatt.h. I don't see why we'd call the headers of
variable-length attributes after one small - but not insignifcant -
use case.
Kind regards,
Matthias van de Meent
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
On Thu, 29 Dec 2022 at 18:16, Andres Freund <andres@anarazel.de> wrote:
We have a bunch of toast*.h files already. The new header should pretty much
only contain the types, given how widely the header is going to be
included. So maybe toast_type.h?
My 2 cents: I don't think that toast_anything.h is appropriate,
because even though the varatt infrastructure does enable
externally-stored oversized attributes (which is the essence of
TOAST), this is not the only (or primary) use of the type.
+1 ... varatt.h sounded fine to me. I'd suggest varlena.h except
we have one already.
regards, tom lane
On 28.12.22 16:07, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
... Then we could either
1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That
way we clean up the files a bit but don't change any external interfaces.2) Just let everyone who needs it include the new file.
3) Compromise: You can avoid most "damage" by having fmgr.h include
varatt.h. That satisfies most data types and extension code. That way,
there are only a few places that need an explicit include of varatt.h.I went with the last option in my patch.
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?
See this incremental patch set.
It seems like maybe there is some intermediate abstraction that a lot of
these places should be using that we haven't thought of yet.
Attachments:
v2-0001-New-header-varatt.h-split-off-from-postgres.h.patchtext/plain; charset=UTF-8; name=v2-0001-New-header-varatt.h-split-off-from-postgres.h.patchDownload+21-928
v2-0002-Move-varatt.h-include-to-individual-files.patchtext/plain; charset=UTF-8; name=v2-0002-Move-varatt.h-include-to-individual-files.patchDownload+41-3
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?
See this incremental patch set.
Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.
(You did check that this passes cpluspluscheck/headerscheck, right?)
It seems like maybe there is some intermediate abstraction that a lot of
these places should be using that we haven't thought of yet.
Hmm. Perhaps, but I think I'm content with this version of the patch.
regards, tom lane
On 2022-12-30 Fr 11:50, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?See this incremental patch set.
Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.(You did check that this passes cpluspluscheck/headerscheck, right?)
It seems like maybe there is some intermediate abstraction that a lot of
these places should be using that we haven't thought of yet.Hmm. Perhaps, but I think I'm content with this version of the patch.
Looked good to me too.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 30.12.22 17:50, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?See this incremental patch set.
Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.
committed
On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
On 30.12.22 17:50, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?See this incremental patch set.
Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.committed
SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse. I would revert this.
On 10.01.23 08:39, Noah Misch wrote:
On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
On 30.12.22 17:50, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?See this incremental patch set.
Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.committed
SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse. I would revert this.
Well, that was sort of my thinking, but people seemed to like this. I'm
happy to consider alternatives.
On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.committed
SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse. I would revert this.Well, that was sort of my thinking, but people seemed to like this. I'm
happy to consider alternatives.
I don't think that the number of extensions that get broken is really
the right metric. It's not fantastic to break large numbers of
extensions, of course, but if the solution is merely to add an #if
PG_VERSION_NUM >= whatever #include "newstuff" #endif then I don't
think it's really an issue. If an extension doesn't have an author who
can do at least that much updating when a new PG release comes out,
then it's basically unmaintained, and I just don't feel that bad about
breaking unmaintained extensions now and then, even annually.
Of course, if we go and remove something that's very widely used and
for which there's no simple workaround, that sucks. Say, removing
LWLocks entirely. But we don't usually do that sort of thing unless
there's a good reason and significant benefits.
I don't think it would be very nice to do something like this in a
minor release. But in a new major release, I think it's fine. I've
been on the hook to maintain extensions in the face of these kinds of
changes at various times over the years, and it's never taken me much
time.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse. I would revert this.
Well, that was sort of my thinking, but people seemed to like this. I'm
happy to consider alternatives.
I don't think it would be very nice to do something like this in a
minor release. But in a new major release, I think it's fine. I've
been on the hook to maintain extensions in the face of these kinds of
changes at various times over the years, and it's never taken me much
time.
Yeah, that was my thinking. We could never do any header refactoring
at all if the standard is "will some extension author need to add a #if".
In practice, we make bigger adjustments than this all the time,
both in header layout and in individual function APIs.
Now, there is a fair question whether splitting this code out of
postgres.h is worth any trouble at all. TBH my initial reaction
had been "no". But once we found that only 40-ish backend files
need to read this new header, I became a "yes" vote because it
seems clear that there will be a total-compilation-time benefit.
regards, tom lane
On Tue, Jan 10, 2023 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Now, there is a fair question whether splitting this code out of
postgres.h is worth any trouble at all. TBH my initial reaction
had been "no". But once we found that only 40-ish backend files
need to read this new header, I became a "yes" vote because it
seems clear that there will be a total-compilation-time benefit.
I wasn't totally about this, either, but I think on balance it's
probably a smart thing to do. I always found it kind of weird that we
put that stuff in postgres.h. It seems to privilege the TOAST
mechanism to an undue degree; what's the argument, for example, that
TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS()
or LWLockAcquire or HeapTuple? It felt to me like we'd just decided
that one subsystem gets to go into the main header file and everybody
else just had to have their own headers.
One thing that's particularly awkward about that is that if you want
to write some front-end code that knows about how varlenas are stored
on disk, it was very awkward with the old structure. You're not
supposed to include "postgres.h" into frontend code, but if the stuff
you need is defined there then what else can you do? So I generally
think that anything that's in postgres.h should have a strong claim to
be not only widely-needed in the backend, but also never needed at all
in any other executable.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 10, 2023 at 12:00:46PM -0500, Robert Haas wrote:
On Tue, Jan 10, 2023 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Now, there is a fair question whether splitting this code out of
postgres.h is worth any trouble at all. TBH my initial reaction
had been "no". But once we found that only 40-ish backend files
need to read this new header, I became a "yes" vote because it
seems clear that there will be a total-compilation-time benefit.
A time claim with no benchmarks is a red flag. I've chosen to run one:
export CCACHE_DISABLE=1
change=d952373a987bad331c0e499463159dd142ced1ef
for commit in $change $change^; do
echo === git checkout $commit
git checkout $commit
for n in `seq 1 200`; do make -j20 clean; env time make -j20; done
done
Results:
commit median mean count
d952373a987bad331c0e499463159dd142ced1ef 49.35 49.37 200
d952373a987bad331c0e499463159dd142ced1ef^ 49.33 49.36 200
That is to say, the patch made the build a bit slower, not faster. That's
with GCC 4.8.5 (RHEL 7). I likely should have interleaved the run types, but
in any case the speed win didn't show up.
I wasn't totally about this, either, but I think on balance it's
probably a smart thing to do. I always found it kind of weird that we
put that stuff in postgres.h. It seems to privilege the TOAST
mechanism to an undue degree; what's the argument, for example, that
TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS()
or LWLockAcquire or HeapTuple? It felt to me like we'd just decided
that one subsystem gets to go into the main header file and everybody
else just had to have their own headers.One thing that's particularly awkward about that is that if you want
to write some front-end code that knows about how varlenas are stored
on disk, it was very awkward with the old structure. You're not
supposed to include "postgres.h" into frontend code, but if the stuff
you need is defined there then what else can you do? So I generally
think that anything that's in postgres.h should have a strong claim to
be not only widely-needed in the backend, but also never needed at all
in any other executable.
If the patch had just made postgres.h include varatt.h, like it does elog.h,
I'd consider that change a nonnegative. Grouping things is nice, even if it
makes compilation a bit slower. That also covers your frontend use case. How
about that?
I agree fixing any one extension is easy enough. Thinking back to the
htup_details.h refactor, I found the aggregate pain unreasonable relative to
alleged benefits, even though each individual extension wasn't too bad.
SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).
On Wed, Jan 11, 2023 at 1:14 AM Noah Misch <noah@leadboat.com> wrote:
If the patch had just made postgres.h include varatt.h, like it does elog.h,
I'd consider that change a nonnegative. Grouping things is nice, even if it
makes compilation a bit slower. That also covers your frontend use case. How
about that?
I'm not direly opposed to that, but I'm also unconvinced that having
the varatt.h stuff is important enough relative to other things to
justify giving it a privileged place forever.
I agree fixing any one extension is easy enough. Thinking back to the
htup_details.h refactor, I found the aggregate pain unreasonable relative to
alleged benefits, even though each individual extension wasn't too bad.
SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).
What annoyed me about that refactoring is that, in most cases where I
had been including htup.h, I had to change it to include
htup_details.h. In the main source tree, too, we've got 31 places that
include access/htup.h and 241 that include access/htup_details.h. It's
hard to argue that it was worth splitting the file given those
numbers, and in fact I think that change was a mistake. But that
doesn't mean every such change is a mistake.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 10.01.23 09:48, Peter Eisentraut wrote:
On 10.01.23 08:39, Noah Misch wrote:
On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
On 30.12.22 17:50, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is
included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?See this incremental patch set.
Wow, 41 files requiring varatt.h is a lot fewer than I would have
guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.committed
SET_VARSIZE alone appears in 74 pgxn distributions, so I predict
extension
breakage en masse. I would revert this.Well, that was sort of my thinking, but people seemed to like this. I'm
happy to consider alternatives.
Given the subsequent discussion, I'll keep it as is for now but consider
it a semi-open item. It's easy to change.