[WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates
Hi,
There was recently talk about if we should start using 128-bit integers
(where available) to speed up the aggregate functions over integers
which uses numeric for their internal state. So I hacked together a
patch for this to see what the performance gain would be.
Previous thread:
/messages/by-id/20141017182500.GF2075@alap3.anarazel.de
What the patch does is switching from using numerics in the aggregate
state to int128 and then convert the type from the 128-bit integer in
the final function.
The functions where we can make use of int128 states are:
- sum(int8)
- avg(int8)
- var_*(int2)
- var_*(int4)
- stdev_*(int2)
- stdev_*(int4)
The initial benchmark results look very promising. When summing 10
million int8 I get a speedup of ~2.5x and similarly for var_samp() on 10
million int4 I see a speed up of ~3.7x. To me this indicates that it is
worth the extra code. What do you say? Is this worth implementing?
The current patch still requires work. I have not written the detection
of int128 support yet, and the patch needs code cleanup (for example: I
used an int16_ prefix on the added functions, suggestions for better
names are welcome). I also need to decide on what estimate to use for
the size of that state.
The patch should work and pass make check on platforms where __int128_t
is supported.
The simple benchmarks:
CREATE TABLE test_int8 AS SELECT x::int8 FROM generate_series(1,
10000000) x;
Before:
# SELECT sum(x) FROM test_int8;
sum
----------------
50000005000000
(1 row)
Time: 2521.217 ms
After:
# SELECT sum(x) FROM test_int8;
sum
----------------
50000005000000
(1 row)
Time: 1022.811 ms
CREATE TABLE test_int4 AS SELECT x::int4 FROM generate_series(1,
10000000) x;
Before:
# SELECT var_samp(x) FROM test_int4;
var_samp
--------------------
8333334166666.6667
(1 row)
Time: 3808.546 ms
After:
# SELECT var_samp(x) FROM test_int4;
var_samp
--------------------
8333334166666.6667
(1 row)
Time: 1033.243 ms
Andreas
Attachments:
int128-agg-v1.patchtext/x-patch; name=int128-agg-v1.patchDownload+467-28
On Sat, Oct 25, 2014 at 12:38 PM, Andreas Karlsson <andreas@proxel.se>
wrote:
Hi,
There was recently talk about if we should start using 128-bit integers
(where available) to speed up the aggregate functions over integers which
uses numeric for their internal state. So I hacked together a patch for
this to see what the performance gain would be.Previous thread: /messages/by-id/20141017182500.
GF2075@alap3.anarazel.deWhat the patch does is switching from using numerics in the aggregate
state to int128 and then convert the type from the 128-bit integer in the
final function.The functions where we can make use of int128 states are:
- sum(int8)
- avg(int8)
- var_*(int2)
- var_*(int4)
- stdev_*(int2)
- stdev_*(int4)The initial benchmark results look very promising. When summing 10 million
int8 I get a speedup of ~2.5x and similarly for var_samp() on 10 million
int4 I see a speed up of ~3.7x. To me this indicates that it is worth the
extra code. What do you say? Is this worth implementing?The current patch still requires work. I have not written the detection of
int128 support yet, and the patch needs code cleanup (for example: I used
an int16_ prefix on the added functions, suggestions for better names are
welcome). I also need to decide on what estimate to use for the size of
that state.The patch should work and pass make check on platforms where __int128_t is
supported.The simple benchmarks:
CREATE TABLE test_int8 AS SELECT x::int8 FROM generate_series(1, 10000000)
x;Before:
# SELECT sum(x) FROM test_int8;
sum
----------------
50000005000000
(1 row)Time: 2521.217 ms
After:
# SELECT sum(x) FROM test_int8;
sum
----------------
50000005000000
(1 row)Time: 1022.811 ms
CREATE TABLE test_int4 AS SELECT x::int4 FROM generate_series(1, 10000000)
x;Before:
# SELECT var_samp(x) FROM test_int4;
var_samp
--------------------
8333334166666.6667
(1 row)Time: 3808.546 ms
After:
# SELECT var_samp(x) FROM test_int4;
var_samp
--------------------
8333334166666.6667
(1 row)Time: 1033.243 ms
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
These are some nice improvements.
As far as I'm aware int128 types are supported on every major compiler when
compiling for 64bit platforms. Right?
On Sat, Oct 25, 2014 at 9:38 AM, Andreas Karlsson <andreas@proxel.se> wrote:
Hi,
There was recently talk about if we should start using 128-bit integers
(where available) to speed up the aggregate functions over integers which
uses numeric for their internal state. So I hacked together a patch for this
to see what the performance gain would be.Previous thread:
/messages/by-id/20141017182500.GF2075@alap3.anarazel.deWhat the patch does is switching from using numerics in the aggregate state
to int128 and then convert the type from the 128-bit integer in the final
function.The functions where we can make use of int128 states are:
- sum(int8)
- avg(int8)
- var_*(int2)
- var_*(int4)
- stdev_*(int2)
- stdev_*(int4)The initial benchmark results look very promising. When summing 10 million
int8 I get a speedup of ~2.5x and similarly for var_samp() on 10 million
int4 I see a speed up of ~3.7x. To me this indicates that it is worth the
extra code. What do you say? Is this worth implementing?
yes.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-10-28 11:05:11 -0200, Arthur Silva wrote:
On Sat, Oct 25, 2014 at 12:38 PM, Andreas Karlsson <andreas@proxel.se>
As far as I'm aware int128 types are supported on every major compiler when
compiling for 64bit platforms. Right?
Depends on what you call major. IIRC some not that old msvc versions
don't for example. Also, there's a couple 32 platforms with int128 bit
support. So I think we should just add a configure test defining the
type + a feature macro.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/28/2014 02:05 PM, Arthur Silva wrote:
As far as I'm aware int128 types are supported on every major compiler
when compiling for 64bit platforms. Right?
Both gcc and clang support __int128_t, but I do not know about other
compilers like icc and MSVC.
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/28/2014 03:24 PM, Andres Freund wrote:
On 2014-10-28 11:05:11 -0200, Arthur Silva wrote:
On Sat, Oct 25, 2014 at 12:38 PM, Andreas Karlsson <andreas@proxel.se>
As far as I'm aware int128 types are supported on every major compiler when
compiling for 64bit platforms. Right?Depends on what you call major. IIRC some not that old msvc versions
don't for example. Also, there's a couple 32 platforms with int128 bit
support. So I think we should just add a configure test defining the
type + a feature macro.
It wouldn't be too hard to just do:
struct {
int64 high_bits;
uint64 low_bits;
} pg_int128;
and some macros for the + - etc. operators. It might be less work than
trying to deal with the portability issues of a native C datatype for this.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-10-28 15:54:30 +0200, Heikki Linnakangas wrote:
On 10/28/2014 03:24 PM, Andres Freund wrote:
On 2014-10-28 11:05:11 -0200, Arthur Silva wrote:
On Sat, Oct 25, 2014 at 12:38 PM, Andreas Karlsson <andreas@proxel.se>
As far as I'm aware int128 types are supported on every major compiler when
compiling for 64bit platforms. Right?Depends on what you call major. IIRC some not that old msvc versions
don't for example. Also, there's a couple 32 platforms with int128 bit
support. So I think we should just add a configure test defining the
type + a feature macro.It wouldn't be too hard to just do:
struct {
int64 high_bits;
uint64 low_bits;
} pg_int128;and some macros for the + - etc. operators. It might be less work than
trying to deal with the portability issues of a native C datatype for this.
And noticeably slower. At least x86-64 does all of this in hardware...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
It wouldn't be too hard to just do:
struct {
int64 high_bits;
uint64 low_bits;
} pg_int128;
and some macros for the + - etc. operators. It might be less work than
trying to deal with the portability issues of a native C datatype for this.
-1. That's not that easy, especially for division, or if you want to
worry about overflow. The point of this patch IMO is to get some low
hanging fruit; coding our own int128 arithmetic doesn't sound like
"low hanging" to me.
Also, we've already got the configure infrastructure for detecting
whether a platform has working int64. It really shouldn't be much
work to transpose that to int128 (especially if we don't care about
printf support, which I think we don't).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/28/2014 04:06 PM, Tom Lane wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
It wouldn't be too hard to just do:
struct {
int64 high_bits;
uint64 low_bits;
} pg_int128;and some macros for the + - etc. operators. It might be less work than
trying to deal with the portability issues of a native C datatype for this.-1. That's not that easy, especially for division, or if you want to
worry about overflow.
The patch doesn't do division with the 128-bit integers. It only does
addition and multiplication. Those are pretty straightforward to implement.
The point of this patch IMO is to get some low
hanging fruit; coding our own int128 arithmetic doesn't sound like
"low hanging" to me.
I wasn't thinking of writing a full-fledged 128-bit type, just the the
few operations needed for this patch.
Also, we've already got the configure infrastructure for detecting
whether a platform has working int64. It really shouldn't be much
work to transpose that to int128 (especially if we don't care about
printf support, which I think we don't).
It would be nicer to be able to use the same code on all platforms. With
a configure test, we'd still need a fallback implementation for
platforms that don't have it.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/28/2014 03:40 PM, Heikki Linnakangas wrote:
The patch doesn't do division with the 128-bit integers. It only does
addition and multiplication. Those are pretty straightforward to implement.
The patch uses division when converting from __int128_t to Numeric.
- Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/28/2014 04:47 PM, Andreas Karlsson wrote:
On 10/28/2014 03:40 PM, Heikki Linnakangas wrote:
The patch doesn't do division with the 128-bit integers. It only does
addition and multiplication. Those are pretty straightforward to implement.The patch uses division when converting from __int128_t to Numeric.
Oh, I see. Hmph, looks like I'm losing an argument..
Moving on to other issues, isn't 128 bits too small to store the squares
of the processed numbers? That could overflow..
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/28/2014 04:01 PM, Heikki Linnakangas wrote:
Moving on to other issues, isn't 128 bits too small to store the squares
of the processed numbers? That could overflow..
Yeah, which is why stddev_*(int8) and var_*(int8) still have to use
Numeric in the aggregate state. For the int2 and int4 versions it is
fine to use __int128_t.
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Here is version 2 of the patch which detects the presence of gcc/clang
style 128-bit integers and has been cleaned up to a reviewable state. I
have not added support for any other compilers since I found no
documentation 128-bit support with icc or MSVC. I do not have access to
any Windows machines either.
A couple of things I was not sure about was the naming of the new
functions and if I should ifdef the size of the aggregate state in the
catalog or not.
--
Andreas Karlsson
Attachments:
int128-agg-v2.patchtext/x-patch; name=int128-agg-v2.patchDownload
On 11/13/2014 02:03 AM, Andreas Karlsson wrote:
Here is version 2 of the patch which detects the presence of gcc/clang
style 128-bit integers and has been cleaned up to a reviewable state. I
have not added support for any other compilers since I found no
documentation 128-bit support with icc or MSVC. I do not have access to
any Windows machines either.A couple of things I was not sure about was the naming of the new
functions and if I should ifdef the size of the aggregate state in the
catalog or not.
The correct file is attached in the message.
--
Andreas Karlsson
Attachments:
int128-agg-v2.patchtext/x-patch; name=int128-agg-v2.patchDownload+543-102
Andreas Karlsson wrote:
On 11/13/2014 02:03 AM, Andreas Karlsson wrote:
Here is version 2 of the patch which detects the presence of gcc/clang
style 128-bit integers and has been cleaned up to a reviewable state. I
have not added support for any other compilers since I found no
documentation 128-bit support with icc or MSVC. I do not have access to
any Windows machines either.A couple of things I was not sure about was the naming of the new
functions and if I should ifdef the size of the aggregate state in the
catalog or not.
configure is a generated file. If your patch touches it but not
configure.in, there is a problem.
diff --git a/configure b/configure
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/13/2014 03:38 AM, Alvaro Herrera wrote:
configure is a generated file. If your patch touches it but not
configure.in, there is a problem.
Thanks for pointing it out, I have now fixed it.
--
Andreas Karlsson
Attachments:
int128-agg-v3.patchtext/x-patch; name=int128-agg-v3.patchDownload+560-102
On 11/13/14 7:57 PM, Andreas Karlsson wrote:
On 11/13/2014 03:38 AM, Alvaro Herrera wrote:
configure is a generated file. If your patch touches it but not
configure.in, there is a problem.Thanks for pointing it out, I have now fixed it.
There is something odd about your patch. I claims that all files are
new files, e.g.:
diff --git a/src/backend/utils/adt/numeric.c
b/src/backend/utils/adt/numeric.c
new file mode 100644
index d61af92..98183b4
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/14/2014 02:25 AM, Peter Eisentraut wrote:
There is something odd about your patch. I claims that all files are
new files, e.g.:diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index d61af92..98183b4 *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c
Those lines look fine to me.
It does not say I have added a new file, it only claims I have changed
the mode of the file. And that is an artifact from using the script
src/tools/git-external-diff which always outputs a line with "new file
mode".
--
Andreas Karlsson
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 14 November 2014 at 13:57, Andreas Karlsson <andreas@proxel.se> wrote:
On 11/13/2014 03:38 AM, Alvaro Herrera wrote:
configure is a generated file. If your patch touches it but not
configure.in, there is a problem.Thanks for pointing it out, I have now fixed it.
Hi Andreas,
These are some very promising performance increases.
I've done a quick pass of reading the patch. I currently don't have a
system with a 128bit int type, but I'm working on that.
Just a couple of things that could do with being fixed:
This fragment needs fixed to put braces on new lines
if (state) {
numstate.N = state->N;
int16_to_numericvar(state->sumX, &numstate.sumX);
int16_to_numericvar(state->sumX2, &numstate.sumX2);
} else {
numstate.N = 0;
}
It also looks like your OIDs have been nabbed by some jsonb stuff.
DETAIL: Key (oid)=(3267) is duplicated.
I'm also wondering why in numeric_int16_sum() you're doing:
#else
return numeric_sum(fcinfo);
#endif
but you're not doing return int8_accum() in the #else part
of int8_avg_accum()
The same goes for int8_accum_inv() and int8_avg_accum_inv(), though perhaps
you're doing it here because of the elog() showing the wrong function name.
Although that's a pretty much "shouldn't ever happen" case that mightn't be
worth worrying about.
Also since I don't currently have a machine with a working int128, I
decided to benchmark master vs patched to see if there was any sort of
performance regression due to numeric_int16_sum calling numeric_sum, but
I'm a bit confused with the performance results as it seems there's quite a
good increase in performance with the patch, I'd have expected there to be
no change.
CREATE TABLE t (value bigint not null);
insert into t select a.a from generate_series(1,5000000) a(a);
vacuum;
int128_bench.sql has select sum(value) from t;
Master:
D:\Postgres\installb\bin>pgbench.exe -f d:\int128_bench.sql -n -T 120
postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 120 s
number of transactions actually processed: 92
latency average: 1304.348 ms
tps = 0.762531 (including connections establishing)
tps = 0.762642 (excluding connections establishing)
Patched:
D:\Postgres\install\bin>pgbench.exe -f d:\int128_bench.sql -n -T 120
postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 120 s
number of transactions actually processed: 99
latency average: 1212.121 ms
tps = 0.818067 (including connections establishing)
tps = 0.818199 (excluding connections establishing)
Postgresql.conf is the same in both instances.
I've yet to discover why this is any faster.
Regards
David Rowley
On Tue, Dec 16, 2014 at 7:04 PM, David Rowley <dgrowleyml@gmail.com> wrote:
It also looks like your OIDs have been nabbed by some jsonb stuff.
DETAIL: Key (oid)=(3267) is duplicated.
Use src/include/catalog/unused_oids to track the OIDs not yet used in
the catalogs when adding new objects for a feature.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers