Improving and extending int128.h to more of numeric.c
Attached are some improvements to include/common/int128.h, including
some new functions that allow it to be used more widely in numeric.c.
In particular, this allows various aggregates to use 128-bit integers
regardless of whether they're natively supported, which should improve
the performance on platforms lacking native 128-bit support, and it
also significantly simplifies a lot of numeric code, by making it the
same on all platforms.
0001 is a trivial bug fix for the test code in src/tools/testint128.c
-- it was using "union" instead of "struct" for test128.hl, which
meant that it was only ever setting and checking half of each 128-bit
integer in the tests.
0002 is a bit of preparatory refactoring of int128.h -- instead of
having all the native implementations at the top of the file, and the
non-native implementations at the bottom, this brings them together
(more like include/common/int.h). IMO, this makes it easier to work
on, since the native and non-native code is now adjacent inside each
function body, and it's not necessary to duplicate every function
comment and declaration, and it's easier to see that every function
has both implementations. Also, if we ever wanted to hand-code a
particular function to be the same on all platforms, it would be
easier with the file laid out this way. Although this means there are
now more #if's and #else's, it reduces the overall file size, and IMO
improves readability and maintainability.
0003 optimises the non-native addition code. Specifically, the test
for whether it needs to propagate a carry to the high part can be made
much simpler by noting that the low-part addition is unsigned integer
arithmetic, which is just modular arithmetic, so all it needs to do is
check for modular wrap-around, which can be done with a single "new <
old" test. In addition, it's possible to code this in a way that is
typically branchless, and produces the same machine code as the native
int128 code (e.g., an ADD and an ADC instruction). For me, this
significantly reduces the runtime of testint128 (from 31s to 16s).
0004 simplifies the non-native multiplication code a bit by using
signed integer multiplication for the first three product terms, which
simplifies the code needed to add the products to the result. Looking
on godbolt.org, this typically leads to significantly smaller output,
with less branching, though I found it only gave around a 3%
improvement to the runtime of testint128. Nonetheless, I still think
it's worth doing, to make the code simpler and more readable.
0005 is the main patch. It adds a few more functions to int128.h and
uses them in numeric.c to allow various functions (mainly aggregate
functions) to use 128-bit integers unconditionally on all platforms.
This applies to the following aggregates:
- sum(int8)
- avg(int8)
- stddev_pop(int4)
- stddev_samp(int4)
- var_pop(int4)
- var_samp(int4)
Excluding the new test code, 0005 gives a slight net reduction in the
total line count, and eliminates nearly all "#ifdef HAVE_INT128"
conditional code from numeric.c, making it significantly simpler and
easier to follow.
Testing on a 32-bit system without native int128 support, I see
something like a 1.3-1.5x speedup in a couple of simple queries using
those aggregates.
Regards,
Dean
Attachments:
v1-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchtext/x-patch; charset=US-ASCII; name=v1-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchDownload+21-28
v1-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchDownload+42-71
v1-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchDownload+15-21
v1-0005-Extend-int128.h-to-support-more-numeric-code.patchtext/x-patch; charset=US-ASCII; name=v1-0005-Extend-int128.h-to-support-more-numeric-code.patchDownload+460-391
v1-0001-Fix-incorrectly-defined-test128-union-in-testint1.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-incorrectly-defined-test128-union-in-testint1.patchDownload+1-2
On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
0001 is a trivial bug fix for the test code in src/tools/testint128.c
-- it was using "union" instead of "struct" for test128.hl, which
meant that it was only ever setting and checking half of each 128-bit
integer in the tests.
Hi Dean, I went to take a look at this and got stuck at building the
test file. The usual pointing gcc to the src and build include
directories didn't cut it. How did you get it to work?
--
John Naylor
Amazon Web Services
On Wed, 9 Jul 2025 at 07:41, John Naylor <johncnaylorls@gmail.com> wrote:
Hi Dean, I went to take a look at this and got stuck at building the
test file. The usual pointing gcc to the src and build include
directories didn't cut it. How did you get it to work?
Yes, it wasn't immediately obvious how to do it. I first built
postgres as normal, including the pg_config tool, and then used that
to compile the test as follows:
gcc -O3 -g \
src/tools/testint128.c \
-I$(pg_config --includedir-server) \
-o src/tools/testint128 \
$(pg_config --libs)
It actually only needs -lpgcommon -lpgport -lm, but it seemed easier
just to include all of the pg_config --libs.
Regards,
Dean
Hi,
On 2025-07-09 10:38:31 +0100, Dean Rasheed wrote:
On Wed, 9 Jul 2025 at 07:41, John Naylor <johncnaylorls@gmail.com> wrote:
Hi Dean, I went to take a look at this and got stuck at building the
test file. The usual pointing gcc to the src and build include
directories didn't cut it. How did you get it to work?Yes, it wasn't immediately obvious how to do it. I first built
postgres as normal, including the pg_config tool, and then used that
to compile the test as follows:gcc -O3 -g \
src/tools/testint128.c \
-I$(pg_config --includedir-server) \
-o src/tools/testint128 \
$(pg_config --libs)It actually only needs -lpgcommon -lpgport -lm, but it seemed easier
just to include all of the pg_config --libs.
I think we should wire this up to the buildsystem and our testsuite... Having
testcode that is not run automatically may be helpful while originally
developing something, but it doesn't do anything to detect portability issues
or regressions.
Greetings,
Andres Freund
On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
I think we should wire this up to the buildsystem and our testsuite... Having
testcode that is not run automatically may be helpful while originally
developing something, but it doesn't do anything to detect portability issues
or regressions.
Yes, perhaps we should convert src/tools/testint128.c into a new test
extension, src/test/modules/test_int128
Regards,
Dean
On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
I think we should wire this up to the buildsystem and our testsuite... Having
testcode that is not run automatically may be helpful while originally
developing something, but it doesn't do anything to detect portability issues
or regressions.Yes, perhaps we should convert src/tools/testint128.c into a new test
extension, src/test/modules/test_int128
Here's an update doing that (in 0001). 0002-0005 are unchanged.
Regards,
Dean
Attachments:
v2-0005-Extend-int128.h-to-support-more-numeric-code.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Extend-int128.h-to-support-more-numeric-code.patchDownload+460-391
v2-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchDownload+42-71
v2-0001-Convert-src-tools-testint128.c-into-a-test-module.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Convert-src-tools-testint128.c-into-a-test-module.patchDownload+86-5
v2-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchDownload+15-21
v2-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchDownload+21-28
On Thu, 10 Jul 2025 at 15:06, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
Yes, perhaps we should convert src/tools/testint128.c into a new test
extension, src/test/modules/test_int128Here's an update doing that (in 0001). 0002-0005 are unchanged.
v3 attached, fixing a couple of issues revealed by the cfbot:
1. The tests, as currently written, require a native int128 type to
run. To fix that, for now at least, skip the tests if the platform
lacks a native int128 type. We could perhaps improve on that by using
numerics to compute the expected results.
2. Fix Visual Studio compiler warning about applying a unary minus
operator to an unsigned type.
Regards,
Dean
Attachments:
v3-0001-Convert-src-tools-testint128.c-into-a-test-module.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Convert-src-tools-testint128.c-into-a-test-module.patchDownload+108-5
v3-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchDownload+42-71
v3-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchDownload+21-28
v3-0005-Extend-int128.h-to-support-more-numeric-code.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Extend-int128.h-to-support-more-numeric-code.patchDownload+460-391
v3-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchDownload+15-21
On Thu, Jul 10, 2025 at 9:06 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
I think we should wire this up to the buildsystem and our testsuite... Having
testcode that is not run automatically may be helpful while originally
developing something, but it doesn't do anything to detect portability issues
or regressions.Yes, perhaps we should convert src/tools/testint128.c into a new test
extension, src/test/modules/test_int128Here's an update doing that (in 0001). 0002-0005 are unchanged.
(Looking at v3) The new test module runs 10 million rather than a
billion iterations. That still takes 1.2s (after 0005), which seems
excessive for regular buildfarm testing. It seems like we could get by
with fewer than that, by using the time of day for the PRNG seed
(which would also need to be logged on error).
On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
0002 is a bit of preparatory refactoring of int128.h -- instead of
having all the native implementations at the top of the file, and the
non-native implementations at the bottom, this brings them together
(more like include/common/int.h).
+1
0003 optimises the non-native addition code. Specifically, the test
for whether it needs to propagate a carry to the high part can be made
much simpler by noting that the low-part addition is unsigned integer
arithmetic, which is just modular arithmetic, so all it needs to do is
check for modular wrap-around, which can be done with a single "new <
old" test. In addition, it's possible to code this in a way that is
typically branchless, and produces the same machine code as the native
int128 code (e.g., an ADD and an ADC instruction). For me, this
significantly reduces the runtime of testint128 (from 31s to 16s).
I see 1/3 less time with the new module, but still noticeably better.
0004 simplifies the non-native multiplication code a bit by using
signed integer multiplication for the first three product terms, which
simplifies the code needed to add the products to the result. Looking
on godbolt.org, this typically leads to significantly smaller output,
with less branching, though I found it only gave around a 3%
improvement to the runtime of testint128. Nonetheless, I still think
it's worth doing, to make the code simpler and more readable.
+1
0005 is the main patch. It adds a few more functions to int128.h and
uses them in numeric.c to allow various functions (mainly aggregate
functions) to use 128-bit integers unconditionally on all platforms.
This applies to the following aggregates:- sum(int8)
- avg(int8)
- stddev_pop(int4)
- stddev_samp(int4)
- var_pop(int4)
- var_samp(int4)Excluding the new test code, 0005 gives a slight net reduction in the
total line count, and eliminates nearly all "#ifdef HAVE_INT128"
conditional code from numeric.c, making it significantly simpler and
easier to follow.
I haven't looked too closely, but wanted to point out:
+ /* check 128/32-bit division */
+ t3.hl.hi = x;
+ t3.hl.lo = y;
+ t1.i128 = t3.i128 / z32;
+ r1 = (int32) (t3.i128 % z32);
+ t2 = t3;
+ int128_div_mod_int32(&t2.I128, z32, &r2);
+
+ if (t1.hl.hi != t2.hl.hi || t1.hl.lo != t2.hl.lo)
+ {
+ printf("%016lX%016lX / signed %lX\n", t3.hl.hi, t3.hl.lo, z32);
On gcc 14.3 -Og this gives
warning: format ‘%lX’ expects argument of type ‘long unsigned int’,
but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=]
...and printing r1 and r2 has the same warnings.
+ if (r1 != r2)
+ {
+ printf("%016lX%016lX % signed %lX\n", t3.hl.hi, t3.hl.lo, z32);
And this gives the above plus
warning: ' ' flag used with ‘%s’ gnu_printf format [-Wformat=]
warning: format ‘%s’ expects argument of type ‘char *’, but argument 4
has type ‘int32’ {aka ‘int’} [-Wformat=]
Testing on a 32-bit system without native int128 support, I see
something like a 1.3-1.5x speedup in a couple of simple queries using
those aggregates.
Nice!
--
John Naylor
Amazon Web Services
On Mon, 14 Jul 2025 at 11:22, John Naylor <johncnaylorls@gmail.com> wrote:
(Looking at v3) The new test module runs 10 million rather than a
billion iterations. That still takes 1.2s (after 0005), which seems
excessive for regular buildfarm testing. It seems like we could get by
with fewer than that, by using the time of day for the PRNG seed
(which would also need to be logged on error).
Thanks for looking!
I have reduced the number of iterations and changed it to use the
current time for the PRNG seed. I don't see much value in logging the
seed though, since we already log the inputs that cause any failure.
0005 is the main patch.
I haven't looked too closely, but wanted to point out:
warning: format ‘%lX’ expects argument of type ‘long unsigned int’,
but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=]...
Ah yes, thanks for pointing that out.
(The cfbot reports the same warnings, but you have to scroll through a
lot of output to see them. It would be nice if the commitfest app had
an indicator to show if there were any compiler warnings.)
v4 attached.
Regards,
Dean
Attachments:
v4-0001-Convert-src-tools-testint128.c-into-a-test-module.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Convert-src-tools-testint128.c-into-a-test-module.patchDownload+111-6
v4-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchDownload+21-28
v4-0005-Extend-int128.h-to-support-more-numeric-code.patchtext/x-patch; charset=US-ASCII; name=v4-0005-Extend-int128.h-to-support-more-numeric-code.patchDownload+460-391
v4-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchDownload+15-21
v4-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchDownload+42-71
On Mon, 14 Jul 2025 at 22:07, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
warning: format ‘%lX’ expects argument of type ‘long unsigned int’,
but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=]v4 attached.
v5 attached, fixing some more printf-related compiler warnings, this
time from the original test code.
Regards,
Dean
Attachments:
v5-0001-Convert-src-tools-testint128.c-into-a-test-module.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Convert-src-tools-testint128.c-into-a-test-module.patchDownload+125-19
v5-0005-Extend-int128.h-to-support-more-numeric-code.patchtext/x-patch; charset=US-ASCII; name=v5-0005-Extend-int128.h-to-support-more-numeric-code.patchDownload+460-391
v5-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patchDownload+42-71
v5-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patchDownload+21-28
v5-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Optimise-non-native-128-bit-addition-in-int128.h.patchDownload+15-21
Hi,
On 2025-07-14 22:07:38 +0100, Dean Rasheed wrote:
(The cfbot reports the same warnings, but you have to scroll through a
lot of output to see them. It would be nice if the commitfest app had
an indicator to show if there were any compiler warnings.)
FWIW, for many warnings the CompilerWarnings task will fail. It's "just" the
32bit build and msvc windows builds that currently don't...
There was a patch adding it for the msvc build at some point, but ...
Greetings,
Andres Freund
On Tue, Jul 15, 2025 at 4:07 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
I have reduced the number of iterations and changed it to use the
current time for the PRNG seed. I don't see much value in logging the
seed though, since we already log the inputs that cause any failure.
Ah, right.
On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
0005 is the main patch. It adds a few more functions to int128.h and
uses them in numeric.c to allow various functions (mainly aggregate
functions) to use 128-bit integers unconditionally on all platforms.
This applies to the following aggregates:- sum(int8)
- avg(int8)
- stddev_pop(int4)
- stddev_samp(int4)
- var_pop(int4)
- var_samp(int4)
Testing on a 32-bit system without native int128 support, I see
something like a 1.3-1.5x speedup in a couple of simple queries using
those aggregates.
With v5, I don't see any difference from master when building on x86
with -m32 for these queries:
select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;
Which queries were you testing?
(Also, unrelated to the patch set, but I was surprised to find
replacing the numeric expressions above with bigint ones
(10_000_000_000 etc) makes the queries at least 5 times slower, and
that's true with a normal 64-bit build as well.)
--
John Naylor
Amazon Web Services
On Wed, 16 Jul 2025 at 10:02, John Naylor <johncnaylorls@gmail.com> wrote:
On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
0005 is the main patch. It adds a few more functions to int128.h and
uses them in numeric.c to allow various functions (mainly aggregate
functions) to use 128-bit integers unconditionally on all platforms.
This applies to the following aggregates:- sum(int8)
- avg(int8)
- stddev_pop(int4)
- stddev_samp(int4)
- var_pop(int4)
- var_samp(int4)Testing on a 32-bit system without native int128 support, I see
something like a 1.3-1.5x speedup in a couple of simple queries using
those aggregates.With v5, I don't see any difference from master when building on x86
with -m32 for these queries:
Thanks for testing!
select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;
The patch won't make any difference to those because "i" is numeric in
those queries, and the patch doesn't touch sum(numeric) or
var_pop(numeric).
Which queries were you testing?
I used the following 2 queries:
SELECT count(*), sum(x), avg(x)
FROM generate_series(1::bigint, 10000000::bigint) g(x);
SELECT count(*), var_pop(x), var_samp(x), stddev_pop(x), stddev_samp(x)
FROM generate_series(1::int, 10000000::int) g(x);
On 64-bit Linux with gcc 14.2 and native int128 support disabled I got
the following results:
Query 1:
HEAD: 1404.096 ms
Patch: 992.818 ms
Query 2:
HEAD: 1498.949 ms
Patch: 935.654 ms
And on a 32-bit Linux VM I got:
Query 1:
HEAD: 2465.202 ms
Patch: 1874.590 ms
Query 2:
HEAD: 2491.991 ms
Patch: 1682.992 ms
I didn't originally try "-m32" on 64-bit Linux because I wasn't sure
how realistic a test that would be, but doing that now I get:
Query 1:
HEAD: 1830.652 ms
Patch: 1411.438 ms
Query 2:
HEAD: 1882.299 ms
Patch: 1299.546 ms
(Also, unrelated to the patch set, but I was surprised to find
replacing the numeric expressions above with bigint ones
(10_000_000_000 etc) makes the queries at least 5 times slower, and
that's true with a normal 64-bit build as well.)
Hmm, are you sure? I don't see that. With -m32, I see:
select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
HEAD: 204.774 ms
Patch: 204.206 ms
(not expecting any difference)
select sum(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
HEAD: 187.426 ms
Patch: 140.741 ms
(as expected, faster than the previous query in HEAD because bigint
generate_series should be faster than numeric generate_series, and
faster still with the sum(bigint) optimisations made by this patch)
select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;
HEAD: 228.386 ms
Patch: 226.712 ms
(not expecting any difference)
select var_pop(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
HEAD: 211.749 ms
Patch: 210.870 ms
(as expected, faster than previous query because of bigint
generate_series, but the patch makes no difference because it doesn't
touch var_pop(bigint))
And another query:
select sum(i::bigint) from generate_series(1e10, 1e10+1e6, 1) i;
HEAD: 271.888 ms
Patch: 227.898 ms
(as expected, slower than the pure numeric version in HEAD because of
the cast, while still using numeric in the aggregate, but with a
decent speedup from the patch, using INT128 in the aggregate)
Regards,
Dean
On Wed, 16 Jul 2025 at 19:23, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On 64-bit Linux with gcc 14.2 and native int128 support disabled I got
the following results:Query 1:
HEAD: 1404.096 ms
Patch: 992.818 msQuery 2:
HEAD: 1498.949 ms
Patch: 935.654 ms
BTW, my other motivation for doing this was to simplify the numeric
code. Even if this had zero performance benefit, as long as it didn't
make things any slower, I would argue that it's worth doing.
The other 2 places in numeric.c that have conditional 128-bit integer
code would require more complex hand-written code to replace, such as
128-bit-by-128-bit division. That's obviously doable, but perhaps not
worth the effort as long as it's only those 2 numeric functions that
need it. OTOH, if there's a wider demand for 128-bit integers, that
might change.
Regards,
Dean
On Thu, Jul 17, 2025 at 1:24 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 16 Jul 2025 at 10:02, John Naylor <johncnaylorls@gmail.com> wrote:
Which queries were you testing?
I used the following 2 queries:
SELECT count(*), sum(x), avg(x)
FROM generate_series(1::bigint, 10000000::bigint) g(x);SELECT count(*), var_pop(x), var_samp(x), stddev_pop(x), stddev_samp(x)
FROM generate_series(1::int, 10000000::int) g(x);On 64-bit Linux with gcc 14.2 and native int128 support disabled I got
the following results:Query 1:
HEAD: 1404.096 ms
Patch: 992.818 msQuery 2:
HEAD: 1498.949 ms
Patch: 935.654 ms
While testing something else on s390x, I noticed that __int128 support
is broken on that platform at least for some versions of clang [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=treehopper&dt=2025-07-17%2019%3A26%3A04&stg=configure,
and I see improvement there with this patch:
Query 1:
HEAD: 3015ms
Patch: 2206ms
Query 2:
HEAD: 3394ms
Patch: 2514ms
(Also, unrelated to the patch set, but I was surprised to find
replacing the numeric expressions above with bigint ones
(10_000_000_000 etc) makes the queries at least 5 times slower, and
that's true with a normal 64-bit build as well.)Hmm, are you sure? I don't see that. With -m32, I see:
select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
HEAD: 204.774 ms
Patch: 204.206 ms
(not expecting any difference)select sum(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
HEAD: 187.426 ms
Patch: 140.741 ms
(as expected, faster than the previous query in HEAD because bigint
generate_series should be faster than numeric generate_series, and
faster still with the sum(bigint) optimisations made by this patch)
Hmm, at the time I was surprised too, and ran multiple times but today
I can't reproduce my earlier results, so not sure what happened. :/
--
John Naylor
Amazon Web Services
On Thu, Jul 17, 2025 at 2:30 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
BTW, my other motivation for doing this was to simplify the numeric
code. Even if this had zero performance benefit, as long as it didn't
make things any slower, I would argue that it's worth doing.
I gathered that was the main motivation, and I agree. I looked over
0005 and don't see any issues.
--
John Naylor
Amazon Web Services
On Fri, 18 Jul 2025 at 07:42, John Naylor <johncnaylorls@gmail.com> wrote:
While testing something else on s390x, I noticed that __int128 support
is broken on that platform at least for some versions of clang [1],
and I see improvement there with this patch:Query 1:
HEAD: 3015ms
Patch: 2206msQuery 2:
HEAD: 3394ms
Patch: 2514ms
Thanks for testing.
On Fri, 18 Jul 2025 at 07:47, John Naylor <johncnaylorls@gmail.com> wrote:
On Thu, Jul 17, 2025 at 2:30 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
BTW, my other motivation for doing this was to simplify the numeric
code. Even if this had zero performance benefit, as long as it didn't
make things any slower, I would argue that it's worth doing.I gathered that was the main motivation, and I agree. I looked over
0005 and don't see any issues.
Thanks for reviewing. If there are no objections, I'll push this
shortly (though I'll change INT64_HEX_FORMAT to PRIx64, since it looks
like the former is about to go away).
Regards,
Dean
Hi,
On 2025-07-14 17:22:38 +0700, John Naylor wrote:
On Thu, Jul 10, 2025 at 9:06 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
I think we should wire this up to the buildsystem and our testsuite... Having
testcode that is not run automatically may be helpful while originally
developing something, but it doesn't do anything to detect portability issues
or regressions.Yes, perhaps we should convert src/tools/testint128.c into a new test
extension, src/test/modules/test_int128Here's an update doing that (in 0001). 0002-0005 are unchanged.
(Looking at v3) The new test module runs 10 million rather than a
billion iterations. That still takes 1.2s (after 0005), which seems
excessive for regular buildfarm testing. It seems like we could get by
with fewer than that, by using the time of day for the PRNG seed
(which would also need to be logged on error).
FWIW, there are a few interesting messages on the host of my buildfarm animal:
Aug 19 02:05:31 andres-postgres-buildfarm-v6 kernel: traps: test_int128[1678696] trap divide error ip:55764d59802c sp:7fffc378f7f8 error:0 in test_int128[402c,55764d595000+4000]
Oct 15 03:08:28 andres-postgres-buildfarm-v6 kernel: traps: test_int128[1984641] trap divide error ip:405a8c sp:7ffc41a93c88 error:0 in test_int128[5a8c,401000+5000]
Oct 15 20:02:15 andres-postgres-buildfarm-v6 kernel: traps: test_int128[3346617] trap divide error ip:404b4c sp:7ffc709ab658 error:0 in test_int128[4b4c,401000+4000]
Greetings,
Andres Freund
On Fri, 24 Oct 2025 at 15:23, Andres Freund <andres@anarazel.de> wrote:
FWIW, there are a few interesting messages on the host of my buildfarm animal:
Aug 19 02:05:31 andres-postgres-buildfarm-v6 kernel: traps: test_int128[1678696] trap divide error ip:55764d59802c sp:7fffc378f7f8 error:0 in test_int128[402c,55764d595000+4000]
Oct 15 03:08:28 andres-postgres-buildfarm-v6 kernel: traps: test_int128[1984641] trap divide error ip:405a8c sp:7ffc41a93c88 error:0 in test_int128[5a8c,401000+5000]
Oct 15 20:02:15 andres-postgres-buildfarm-v6 kernel: traps: test_int128[3346617] trap divide error ip:404b4c sp:7ffc709ab658 error:0 in test_int128[4b4c,401000+4000]
Ah, I presume this is because there is a small, but non-zero chance
that the test code will attempt to divide by zero, so it needs to make
sure that z32 is non-zero. Will fix.
Regards,
Dean
On Fri, 24 Oct 2025 at 20:30, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Fri, 24 Oct 2025 at 15:23, Andres Freund <andres@anarazel.de> wrote:
FWIW, there are a few interesting messages on the host of my buildfarm animal:
Aug 19 02:05:31 andres-postgres-buildfarm-v6 kernel: traps: test_int128[1678696] trap divide error ip:55764d59802c sp:7fffc378f7f8 error:0 in test_int128[402c,55764d595000+4000]
Oct 15 03:08:28 andres-postgres-buildfarm-v6 kernel: traps: test_int128[1984641] trap divide error ip:405a8c sp:7ffc41a93c88 error:0 in test_int128[5a8c,401000+5000]
Oct 15 20:02:15 andres-postgres-buildfarm-v6 kernel: traps: test_int128[3346617] trap divide error ip:404b4c sp:7ffc709ab658 error:0 in test_int128[4b4c,401000+4000]Ah, I presume this is because there is a small, but non-zero chance
that the test code will attempt to divide by zero, so it needs to make
sure that z32 is non-zero. Will fix.
I was able to produce a division-by-zero failure by running the full 1
billion iteration test a few times, and I've pushed a fix for it.
Regards,
Dean