Some efforts to get rid of "long" in our codebase
I was playing around with the following imperfect regex to try and get
an idea of how many "long" usages we have.
git grep -E "^\s*(static)?\s*(unsigned)?\s*long\b" -- '*.c' '*.h' | wc -l
REL_13_STABLE -> 486
REL_14_STABLE -> 482
REL_15_STABLE -> 476
REL_16_STABLE -> 485
REL_17_STABLE -> 449
REL_18_STABLE -> 439
master -> 386
(Generally trending downwards with quite a good reduction since v18)
Many of the "long" usages are genuine, e.g for compatibility with
library functions, or in snprintf.c for the %ld and %lu format
specifiers; those we need. However, many of these we don't need. Many
are a possible source of bugs due to not all platforms we support
having the same idea about how wide longs are. We've had and fixed
bugs around this before.
I've attached a couple of patches to get the ball rolling.
0001: CATCACHE_STATS is using signed longs as counters for cache
hits/misses etc. I've changed this to uint64 rather than int64 as I
didn't see the need to have a signed type to count the occurrences of
an event. Maybe there's an anti-universe where negative occurrences
are a thing, but they're not in this one. If something goes awry with
the counters and that causes the subtraction that's being done to
wrap, then we're more likely to notice the bug via the excessively
large number the wrap would end up displaying.
0002: MemSet / MemSetAligned macros. It's probably about time someone
made these disappear, but that's likely for another thread with more
research than I'd like to do here. I replaced "long" with "Size". I
also considered "uintptr_t", but after some reading of the C standard,
I settled on Size as it seems it's possible for platforms to exist
where the pointer width is smaller than the processor's width. I
suspect it might not matter for the platforms we support? Size could
also be smaller than the processor's width, but I feel that's less
likely.
Which yields a drop in the Ocean fewer longs...
0001+0002 -> 367
David
Attachments:
v1-0001-Get-rid-of-long-datatype-in-CATCACHE_STATS-enable.patchapplication/octet-stream; name=v1-0001-Get-rid-of-long-datatype-in-CATCACHE_STATS-enable.patchDownload+17-18
v1-0002-Adjust-MemSet-macro-to-use-Size-rather-than-long.patchapplication/octet-stream; name=v1-0002-Adjust-MemSet-macro-to-use-Size-rather-than-long.patchDownload+9-10
On 06/11/2025 13:46, David Rowley wrote:
I've attached a couple of patches to get the ball rolling.
Thanks!
0001: CATCACHE_STATS is using signed longs as counters for cache
hits/misses etc. I've changed this to uint64 rather than int64 as I
didn't see the need to have a signed type to count the occurrences of
an event. Maybe there's an anti-universe where negative occurrences
are a thing, but they're not in this one. If something goes awry with
the counters and that causes the subtraction that's being done to
wrap, then we're more likely to notice the bug via the excessively
large number the wrap would end up displaying.
Unsigned makes sense for these.
@@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
if (cache->cc_ntup == 0 && cache->cc_searches == 0) continue; /* don't print unused caches */ - elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits", + elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" PRIu64 " lhits", cache->cc_relname, cache->cc_indexoid, cache->cc_ntup,
Unfortunately PRIu64 makes these much longer and less readable. I don't
think there's much we can do about that though. Perhaps split the format
string to multiple lines?
0002: MemSet / MemSetAligned macros. It's probably about time someone
made these disappear, but that's likely for another thread with more
research than I'd like to do here. I replaced "long" with "Size". I
also considered "uintptr_t", but after some reading of the C standard,
I settled on Size as it seems it's possible for platforms to exist
where the pointer width is smaller than the processor's width. I
suspect it might not matter for the platforms we support? Size could
also be smaller than the processor's width, but I feel that's less
likely.
Let's use size_t instead of Size in new code, per previous discussions
(/messages/by-id/CA+RLCQyVqb9Xxni6x2iJYTawMbJv5gZY2fzNaw39=_yOtu_QKw@mail.gmail.com
- Heikki
On 06.11.25 13:17, Heikki Linnakangas wrote:
@@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
if (cache->cc_ntup == 0 && cache->cc_searches == 0) continue; /* don't print unused caches */ - elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits", + elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" PRIu64 " lhits", cache->cc_relname, cache->cc_indexoid, cache->cc_ntup,Unfortunately PRIu64 makes these much longer and less readable. I don't
think there's much we can do about that though. Perhaps split the format
string to multiple lines?
You could also use unsigned long long int for these, to make the format
strings more readable.
On 06.11.25 12:46, David Rowley wrote:
0002: MemSet / MemSetAligned macros. It's probably about time someone
made these disappear, but that's likely for another thread with more
research than I'd like to do here. I replaced "long" with "Size". I
also considered "uintptr_t", but after some reading of the C standard,
I settled on Size as it seems it's possible for platforms to exist
where the pointer width is smaller than the processor's width. I
suspect it might not matter for the platforms we support? Size could
also be smaller than the processor's width, but I feel that's less
likely.
I think size_t/Size could be misleading here. You're not measuring any
size, you're just chunking up the bytes to zero into something that we
thing the compiler or CPU can handle very efficiently.
So in a sense, using long isn't wrong here. It might well be the best
for this. If there is an aversion to using any long at all, why not
long long.
On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
On 06.11.25 12:46, David Rowley wrote:
0002: MemSet / MemSetAligned macros. It's probably about time someone
made these disappear, but that's likely for another thread with more
research than I'd like to do here. I replaced "long" with "Size". I
also considered "uintptr_t", but after some reading of the C standard,
I settled on Size as it seems it's possible for platforms to exist
where the pointer width is smaller than the processor's width. I
suspect it might not matter for the platforms we support? Size could
also be smaller than the processor's width, but I feel that's less
likely.I think size_t/Size could be misleading here. You're not measuring any
size, you're just chunking up the bytes to zero into something that we
thing the compiler or CPU can handle very efficiently.So in a sense, using long isn't wrong here. It might well be the best
for this. If there is an aversion to using any long at all, why not
long long.
The point in changing this was that long isn't a good fit as it's not
64-bit on 64-bit Windows. My aim is to find a type with a width the
same as the processor's word size. long long tends to be 64-bit on
32-bit platforms, so doesn't seem a very good fit. Someone might argue
that doesn't matter now since we no longer have 4-byte Datums, but
IMO, this isn't any reason to make things even worse for 32-bit
builds.
David
On 06.11.25 21:06, David Rowley wrote:
On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
On 06.11.25 12:46, David Rowley wrote:
0002: MemSet / MemSetAligned macros. It's probably about time someone
made these disappear, but that's likely for another thread with more
research than I'd like to do here. I replaced "long" with "Size". I
also considered "uintptr_t", but after some reading of the C standard,
I settled on Size as it seems it's possible for platforms to exist
where the pointer width is smaller than the processor's width. I
suspect it might not matter for the platforms we support? Size could
also be smaller than the processor's width, but I feel that's less
likely.I think size_t/Size could be misleading here. You're not measuring any
size, you're just chunking up the bytes to zero into something that we
thing the compiler or CPU can handle very efficiently.So in a sense, using long isn't wrong here. It might well be the best
for this. If there is an aversion to using any long at all, why not
long long.The point in changing this was that long isn't a good fit as it's not
64-bit on 64-bit Windows. My aim is to find a type with a width the
same as the processor's word size. long long tends to be 64-bit on
32-bit platforms, so doesn't seem a very good fit. Someone might argue
that doesn't matter now since we no longer have 4-byte Datums, but
IMO, this isn't any reason to make things even worse for 32-bit
builds.
Yeah you're right. If we cared a lot about MemSet's longevity, I would
suggest making a new type for this, but that would leak into all users
of c.h, so maybe not.
I do suggest some kind of comment, that we're using size_t as a
convenient proxy for the most suitable chunk/step size for the platform,
not to actually measure a size.
Peter Eisentraut <peter@eisentraut.org> writes:
I do suggest some kind of comment, that we're using size_t as a
convenient proxy for the most suitable chunk/step size for the platform,
not to actually measure a size.
Yeah. There's not actually anything wrong with using "long" here,
except that we believe that on Win64 it's probably not the processor's
native word width. Size/size_t is more likely to match that.
([u]intptr_t is even more likely to match, but I don't think that's a
good choice because it invites confusion with the usage of uintptr_t
for address arithmetic elsewhere in the macro. That's quite unrelated
to the choice of copy step size.)
regards, tom lane
On Fri, 7 Nov 2025 at 07:26, Peter Eisentraut <peter@eisentraut.org> wrote:
On 06.11.25 13:17, Heikki Linnakangas wrote:
@@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
if (cache->cc_ntup == 0 && cache->cc_searches == 0) continue; /* don't print unused caches */ - elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits", + elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" PRIu64 " lhits", cache->cc_relname, cache->cc_indexoid, cache->cc_ntup,Unfortunately PRIu64 makes these much longer and less readable. I don't
think there's much we can do about that though. Perhaps split the format
string to multiple lines?You could also use unsigned long long int for these, to make the format
strings more readable.
I couldn't really decide on what's best here. I'd personally rather be
more explicit about the width of the type by using uint64, but on the
other hand, I don't know of any realistic modern hardware/compiler
combination where unsigned long long isn't 64-bit. Certainly, the
format string is easier to read with %llu.
v2-0001 wraps the format string as suggested by Heikki, v3-0001 uses
unsigned long long as suggested by Peter.
v2-0002 is updated to use size_t instead of Size, per Heikki
Any further opinions or votes on v2-0001 vs v3-0001?
David
Attachments:
v3-0001-Get-rid-of-long-datatype-in-CATCACHE_STATS-enable.patch.txttext/plain; charset=US-ASCII; name=v3-0001-Get-rid-of-long-datatype-in-CATCACHE_STATS-enable.patch.txtDownload+17-18
v2-0001-Get-rid-of-long-datatype-in-CATCACHE_STATS-enable.patchapplication/octet-stream; name=v2-0001-Get-rid-of-long-datatype-in-CATCACHE_STATS-enable.patchDownload+23-18
v2-0002-Adjust-MemSet-macro-to-use-size_t-rather-than-lon.patchapplication/octet-stream; name=v2-0002-Adjust-MemSet-macro-to-use-size_t-rather-than-lon.patchDownload+19-20
On 11/6/25 12:46 PM, David Rowley wrote:
I've attached a couple of patches to get the ball rolling.
This inspired me to write my own patch to chip away at the use of long.
The target for the attached patch is TimestampDifference() and
feTimestampDifference() which return the timestamp difference in seconds
and microseconds as a long and an int and replace it with in64 and
int32. The return values of these functions are used a lot for printing
durations to the log and sometimes to populate a struct timeval (in that
case returning a timeval would make some sense but time_t is sadly 32
bits on some platforms).
The patch also cleans up code a bit in check_log_duration() which was
unnecessarily complicated.
But writing the mail made me wonder if a cleaner solution wouldn't be to
just make TimestampDifference() return a TimeOffset and then write
helper functions to e.g. convert a TimeOffset to a timeval or to
milliseconds.
Andreas
Attachments:
0001-Stop-using-long-TimestampDifference.patchtext/x-patch; charset=UTF-8; name=0001-Stop-using-long-TimestampDifference.patchDownload+67-75
On Mon, 10 Nov 2025 at 13:53, David Rowley <dgrowleyml@gmail.com> wrote:
v2-0001 wraps the format string as suggested by Heikki, v3-0001 uses
unsigned long long as suggested by Peter.v2-0002 is updated to use size_t instead of Size, per Heikki
Any further opinions or votes on v2-0001 vs v3-0001?
Nobody seems to feel strongly either way, so I looked again and
thought that using uint64 is nicer as the size of the type is
explicit. We do want a 64-bit type here, not something bigger, which
in theory, long long could be. The less readable format, IMO seemed
like an ok trade-off to be explicit about the type's size.
With that, I pushed the CATCACHE_STATS patch. I also pushed the
MemSet/MemSetAligned one too.
Thanks for looking.
David