Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#1)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On Mon, Jan 26, 2015 at 8:43 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

Another spinoff from the abbreviation discussion. Peter Geoghegan
suggested on IRC that numeric would benefit from abbreviation, and
indeed it does (in some cases by a factor of about 6-7x or more, because
numeric comparison is no speed demon).

Cool.

What I find particularly interesting about this patch is that it makes
sorting numerics significantly faster than even sorting float8 values,
at least some of the time, even though the latter has generic
SortSupport (for fmgr elision). Example:

postgres=# create table foo as select x::float8 x, x::numeric y from
(select random() * 10000000 x from generate_series(1,1000000) a) b;
SELECT 1000000

This query takes about 525ms after repeated executions: select *
from (select * from foo order by x offset 1000000000) i;

However, this query takes about 412ms:
select * from (select * from foo order by y offset 1000000000) i;

There is probably a good case to be made for float8 abbreviation
support....just as well that your datum abbreviation patch doesn't
imply that pass-by-value types cannot be abbreviated across the board
(it only implies that abbreviation of pass-by-value types is not
supported in the datum sort case). :-)

Anyway, the second query above (the one with the numeric ORDER BY
column) is enormously faster than the same query executed against
master's tip. That takes about 1720ms following repeated executions.
So at least that case is over 4x faster, suggesting that abbreviation
support for numeric is well worthwhile. So I'm signed up to review
this one too.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#2)

Re: Re: Abbreviated keys for Numeric

"Peter" == Peter Geoghegan <pg@heroku.com> writes:

Peter> What I find particularly interesting about this patch is that it
Peter> makes sorting numerics significantly faster than even sorting
Peter> float8 values,

I get a much smaller difference there than you do.

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#3)

Re: Re: Abbreviated keys for Numeric

On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.

Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.

I am not seriously suggesting pursuing abbreviation for float8 in the
near term - numeric is clearly what we should concentrate on. It's
interesting that abbreviation of float8 could potentially make sense,
though.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 11 years ago

In reply to: Peter Geoghegan (#4)

Re: Re: Abbreviated keys for Numeric

On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote:

On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.

Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.

I think getting rid of the function call for DatumGetFloat8() would be
quite the win. On x86-64 the conversion then should amount to mov
%rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both
instructions have a cycle count of 1 + L1 access latency (4) + 2 because
they use the same exection port. So it's about 12 fully pipelineable
cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be
noticeable if the conversion were inlined.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: Andres Freund (#5)

Re: Re: Abbreviated keys for Numeric

On 27/01/15 00:51, Andres Freund wrote:

On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote:

On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.

Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.

I think getting rid of the function call for DatumGetFloat8() would be
quite the win. On x86-64 the conversion then should amount to mov
%rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both
instructions have a cycle count of 1 + L1 access latency (4) + 2 because
they use the same exection port. So it's about 12 fully pipelineable
cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be
noticeable if the conversion were inlined.

IIRC the DatumGetFloat8 was quite visible in the perf when I was writing
the array version of width_bucket. It was one of the motivations for
making special float8 version since not having to call it had
significant effect. Sadly I don't remember if it was the function call
itself or the conversion anymore.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#2)

Re: Re: Abbreviated keys for Numeric

"Peter" == Peter Geoghegan <pg@heroku.com> writes:

Peter> What I find particularly interesting about this patch is that it
Peter> makes sorting numerics significantly faster than even sorting
Peter> float8 values,

Played some more with this. Testing on some different gcc versions
showed that the results were not consistent between versions; the latest
I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8
as slightly slower; the difference was all in the time of the float8
case, the time for numeric was virtually the same.

For one specific test query, taking the best time of multiple runs,

float8: gcc4.7 = 980ms, gcc4.9 = 833ms
numeric: gcc4.7 = 940ms, gcc4.9 = 920ms

(vs. 650ms for bigint on either version)

So honestly I think abbreviation for float8 is a complete red herring.

Also, I couldn't get any detectable benefit from inlining
DatumGetFloat8, though I may have to play more with that to be certain
(above tests did not have any float8-related modifications at all, just
the datum and numeric abbrevs patches).

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Gavin Flower

GavinFlower@archidevsys.co.nz

about 11 years ago

In reply to: Andrew Gierth (#7)

Re: Re: Abbreviated keys for Numeric

On 28/01/15 06:29, Andrew Gierth wrote:

"Peter" == Peter Geoghegan <pg@heroku.com> writes:

Peter> What I find particularly interesting about this patch is that it
Peter> makes sorting numerics significantly faster than even sorting
Peter> float8 values,

Played some more with this. Testing on some different gcc versions
showed that the results were not consistent between versions; the latest
I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8
as slightly slower; the difference was all in the time of the float8
case, the time for numeric was virtually the same.

For one specific test query, taking the best time of multiple runs,

float8: gcc4.7 = 980ms, gcc4.9 = 833ms
numeric: gcc4.7 = 940ms, gcc4.9 = 920ms

(vs. 650ms for bigint on either version)

So honestly I think abbreviation for float8 is a complete red herring.

Also, I couldn't get any detectable benefit from inlining
DatumGetFloat8, though I may have to play more with that to be certain
(above tests did not have any float8-related modifications at all, just
the datum and numeric abbrevs patches).

Since gcc5.0 is due to be released in less than 3 months, it might be
worth testing with that.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pg@bowt.ie

about 11 years ago

In reply to: Peter Geoghegan (#4)

Re: Re: Abbreviated keys for Numeric

On Mon, Jan 26, 2015 at 3:35 PM, Peter Geoghegan <pg@heroku.com> wrote:

I am not seriously suggesting pursuing abbreviation for float8 in the
near term - numeric is clearly what we should concentrate on. It's
interesting that abbreviation of float8 could potentially make sense,
though.

Note that in the IEEE 754 standard, the exponent does not have a sign.
Rather, an exponent bias is subtracted from it (127 for single
precision floats, and 1023 for double precision floats). This, and the
bit sequence of the mantissa allows floats to be compared and sorted
correctly even when interpreting them as integers. The exception is
NaN, but then we have an exception to that exception.

This is a really old idea, actually. I first saw it in a paper written
in the 1960s, long before math coprocessors became standard. Haven't
really thrashed this out enough, but I offhand I guess it would work.

The other problem is that positive IEEE floating-point numbers sort
like integers with the same bits, and negative IEEE floating-point
numbers sort in the reverse order of integers with the same bits. So
we'd probably end up with an encoding scheme that accounted for that,
and forget about tie-breakers (or have a NOOP "return 0" tie-breaker).
An example of the problem:

postgres=# create table foo (a float8);
CREATE TABLE
postgres=# insert into foo values (1), (2), (3), (-1), (-2), (-3);
INSERT 0 6
postgres=# select * from foo order by a;
a
----
-1
-2
-3
1
2
3
(6 rows)

The reason that this conversion usually doesn't occur in library
sorting routines is because it only helps significantly on x86, has
additional memory overhead, and ordinarily requires that we convert
back when we're done sorting. The costs/benefit analysis for tuplesort
would be much more favorable than a generic float sorting case, given
that we pretty much have datum1 storage as a sunk costs anyway, and
given that we don't need to convert back the datum1 representation,
and given that the encoding process would be dirt cheap and occur at a
time when we were likely totally bottlenecked on memory bandwidth.

I don't want to get bogged down on this - the numeric abbreviation
patch *is* still much more compelling - but maybe abbreviation of
float8 isn't a red herring after all.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

robertmhaas@gmail.com

about 11 years ago

In reply to: Peter Geoghegan (#9)

Re: Re: Abbreviated keys for Numeric

On Sat, Jan 31, 2015 at 7:07 PM, Peter Geoghegan <pg@heroku.com> wrote:

I don't want to get bogged down on this - the numeric abbreviation
patch *is* still much more compelling - but maybe abbreviation of
float8 isn't a red herring after all.

I'm completely on-board with doing something about numeric. I think
it might be pretty foolish to try to do anything about any data type
the CPU has hard-wired knowledge of. We're basically betting that we
can do better in software than they did in hardware, and even if that
happens to be true on some systems under some circumstances, it leaves
us in a poor position to leverage future improvements to the silicon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Andrew Gierth (#1)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

Hi,

On 26.1.2015 17:43, Andrew Gierth wrote:

Another spinoff from the abbreviation discussion. Peter Geoghegan
suggested on IRC that numeric would benefit from abbreviation, and
indeed it does (in some cases by a factor of about 6-7x or more, because
numeric comparison is no speed demon).

This patch abbreviates numerics to a weight+initial digits
representation (the details differ slightly between 32bit and 64bit
builds to make the best use of the available bits).

On 32bit, numeric values that are between about 10^-44 and 10^83, and
which differ either in order of magnitude or in the leading 7
significant decimal digits (not base-10000 digits, single decimals) will
get distinct abbreviations. On 64bit the range is 10^-176 to 10^332 and
the first 4 base-10000 digits are kept, thus comparing 13 to 16 decimal
digits. This is expected to be ample for applications using numeric to
store numbers; applications that store things in numeric that aren't
actually numbers might not see the benefit, but I have not found any
detectable slowdown from the patch even on constructed pathological
data.

I've done some testing on this (along with the other patch doing the
same with Datum values), but I'm yet to see a query that actually
benefits from this.

For example with the same percentile_disc() test as in the other thread:

create table stuff as select random()::numeric as randnum
from generate_series(1,1000000);

analyze stuff;

select percentile_disc(0) within group (order by randnum) from stuff;

I get pretty much no difference in runtimes (not even for the smallest
dataset, where the Datum patch speedup was significant).

What am I doing wrong?

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

pg@bowt.ie

about 11 years ago

In reply to: Tomas Vondra (#11)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On Fri, Feb 20, 2015 at 1:33 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

For example with the same percentile_disc() test as in the other thread:

create table stuff as select random()::numeric as randnum
from generate_series(1,1000000);

analyze stuff;

select percentile_disc(0) within group (order by randnum) from stuff;

I get pretty much no difference in runtimes (not even for the smallest
dataset, where the Datum patch speedup was significant).

What am I doing wrong?

So you're testing both the patches (numeric + datum tuplesort) at the same time?

I can't think why this would make any difference. Did you forget to
initdb, so that the numeric sortsupport routine was used?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Peter Geoghegan (#12)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On 21.2.2015 00:14, Peter Geoghegan wrote:

On Fri, Feb 20, 2015 at 1:33 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

For example with the same percentile_disc() test as in the other
thread:

create table stuff as select random()::numeric as randnum from
generate_series(1,1000000);

analyze stuff;

select percentile_disc(0) within group (order by randnum) from
stuff;

I get pretty much no difference in runtimes (not even for the
smallest dataset, where the Datum patch speedup was significant).

What am I doing wrong?

So you're testing both the patches (numeric + datum tuplesort) at the
same time?

No, I was just testing two similar patches separately. I.e. master vs.
each patch separately.

I can't think why this would make any difference. Did you forget to
initdb, so that the numeric sortsupport routine was used?

No, but just to be sure I repeated the benchmarks and I still get the
same results. Each test run does this:

1) remove data directory
2) initdb
3) copy postgresql.conf (with minor tweaks - work_mem/shared_buffers)
4) start
5) create database
6) create test table
7) run a query 5x

I repeated this, just to be sure, but nope - still no speedup :-(

For master vs. patch, I do get these results:

master patched speedup
---------------------------------------------------------
generate_series(1,1000000) 1.20 1.25 0.96
generate_series(1,2000000) 2.75 2.75 1.00
generate_series(1,3000000) 4.40 4.40 1.00

So, no difference :(

Scripts attached, but it's really trivial test - hopefully I haven't
done anything dumb.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14

pg@bowt.ie

about 11 years ago

In reply to: Tomas Vondra (#13)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On Fri, Feb 20, 2015 at 4:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

So you're testing both the patches (numeric + datum tuplesort) at the
same time?

No, I was just testing two similar patches separately. I.e. master vs.
each patch separately.

Well, you're sorting numeric here, no? Why should it matter that a
datum sort has abbreviation support, if the underlying type (numeric)
does not support abbreviation? OTOH, why should having oplcass
abbreviation support (for numeric) matter if the class of tuple sorted
(datum "tuples") does not support abbreviation? You need both to
meaningfully benchmark either (as long as you're looking at a case
involving both).

I suggest looking at datum sorts with text for the datum sort patch,
and non-datum tuplesort cases for the numeric patch, at least until
such time as one or the other is committed.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Peter Geoghegan (#14)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On 21.2.2015 01:17, Peter Geoghegan wrote:

On Fri, Feb 20, 2015 at 4:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

So you're testing both the patches (numeric + datum tuplesort) at the
same time?

No, I was just testing two similar patches separately. I.e. master vs.
each patch separately.

Well, you're sorting numeric here, no? Why should it matter that a
datum sort has abbreviation support, if the underlying type (numeric)
does not support abbreviation? OTOH, why should having oplcass
abbreviation support (for numeric) matter if the class of tuple sorted
(datum "tuples") does not support abbreviation? You need both to
meaningfully benchmark either (as long as you're looking at a case
involving both).

I suggest looking at datum sorts with text for the datum sort patch,
and non-datum tuplesort cases for the numeric patch, at least until
such time as one or the other is committed.

Isn't this patch about adding abbreviated keys for Numeric data type?
That's how I understood it, and looking into numeric_sortsup.patch seems
to confirm that.

There's another patch for Datum, but that's a different thread.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

pg@bowt.ie

about 11 years ago

In reply to: Tomas Vondra (#15)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On Fri, Feb 20, 2015 at 4:42 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Isn't this patch about adding abbreviated keys for Numeric data type?
That's how I understood it, and looking into numeric_sortsup.patch seems
to confirm that.

There's another patch for Datum, but that's a different thread.

Right...so don't test a datum sort case, since that isn't supported at
all in the master branch. Your test case is invalid for that reason.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Peter Geoghegan (#16)

Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On 21.2.2015 01:45, Peter Geoghegan wrote:

On Fri, Feb 20, 2015 at 4:42 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Isn't this patch about adding abbreviated keys for Numeric data type?
That's how I understood it, and looking into numeric_sortsup.patch seems
to confirm that.

There's another patch for Datum, but that's a different thread.

Right...so don't test a datum sort case, since that isn't supported at
all in the master branch. Your test case is invalid for that reason.

What do you mean by 'Datum sort case'? The test I was using is this:

create table stuff as select (random())::numeric as randnum
from generate_series(1,1000000);

select percentile_disc(0) within group (order by randnum) from stuff;

That's a table with a Numeric column, and a sort on that Numeric, no?

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Tomas Vondra (#17)

Re: Abbreviated keys for Numeric

"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Right...so don't test a datum sort case, since that isn't supported
at all in the master branch. Your test case is invalid for that
reason.

Tomas> What do you mean by 'Datum sort case'?

A case where the code path goes via tuplesort_begin_datum rather than
tuplesort_begin_heap.

Tomas> The test I was using is this:

Tomas> select percentile_disc(0) within group (order by randnum) from stuff;

Sorting single columns in aggregate calls uses the Datum sort path (in
fact I think it's currently the only place that does).

Do that test with _both_ the Datum and Numeric sort patches in place,
and you will see the effect. With only the Numeric patch, the numeric
abbrev code is not called.

If you want a test that works without the Datum patch, try:

select count(*) from (select randnum from stuff order by randnum) s;

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Andrew Gierth (#18)

Re: Abbreviated keys for Numeric

On 21.2.2015 02:00, Andrew Gierth wrote:

"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Right...so don't test a datum sort case, since that isn't supported
at all in the master branch. Your test case is invalid for that
reason.

Tomas> What do you mean by 'Datum sort case'?

A case where the code path goes via tuplesort_begin_datum rather than
tuplesort_begin_heap.

Tomas> The test I was using is this:

Tomas> select percentile_disc(0) within group (order by randnum) from stuff;

Sorting single columns in aggregate calls uses the Datum sort path (in
fact I think it's currently the only place that does).

Do that test with _both_ the Datum and Numeric sort patches in place,
and you will see the effect. With only the Numeric patch, the numeric
abbrev code is not called.

D'oh! Thanks for the explanation.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Tomas Vondra (#19)

Re: Abbreviated keys for Numeric

Hi,

On 21.2.2015 02:06, Tomas Vondra wrote:

On 21.2.2015 02:00, Andrew Gierth wrote:

"Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Right...so don't test a datum sort case, since that isn't supported
at all in the master branch. Your test case is invalid for that
reason.

Tomas> What do you mean by 'Datum sort case'?

A case where the code path goes via tuplesort_begin_datum rather than
tuplesort_begin_heap.

Tomas> The test I was using is this:

Tomas> select percentile_disc(0) within group (order by randnum) from stuff;

Sorting single columns in aggregate calls uses the Datum sort path (in
fact I think it's currently the only place that does).

Do that test with _both_ the Datum and Numeric sort patches in place,
and you will see the effect. With only the Numeric patch, the numeric
abbrev code is not called.

D'oh! Thanks for the explanation.

OK, so I've repeated the benchmarks with both patches applied, and I
think the results are interesting. I extended the benchmark a bit - see
the SQL script attached.

1) multiple queries

select percentile_disc(0) within group (order by val) from stuff

select count(distinct val) from stuff

select * from
(select * from stuff order by val offset 100000000000) foo

2) multiple data types - int, float, text and numeric

3) multiple scales - 1M, 2M, 3M, 4M and 5M rows

Each query was executed 10x, the timings were averaged. I do know some
of the data types don't benefit from the patches, but I included them to
get a sense of how noisy the results are.

I did the measurements for

1) master
2) master + datum_sort_abbrev.patch
3) master + datum_sort_abbrev.patch + numeric_sortsup.patch

and then computed the speedup for each type/scale combination (the
impact on all the queries is almost exactly the same).

Complete results are available here: http://bit.ly/1EA4mR9

I'll post all the summary here, although some of the numbers are about
the other abbreviated keys patch.

1) datum_sort_abbrev.patch vs. master

scale float int numeric text
---------------------------------------------
1 101% 99% 105% 404%
2 101% 98% 96% 98%
3 101% 101% 99% 97%
4 100% 101% 98% 95%
5 99% 98% 93% 95%

2) numeric_sortsup.patch vs. master

scale float int numeric text
---------------------------------------------
1 97% 98% 374% 396%
2 100% 101% 407% 96%
3 99% 102% 407% 95%
4 99% 101% 423% 92%
5 95% 99% 411% 92%

I think the gains are pretty awesome - I mean, 400% speedup for Numeric
accross the board? Yes please!

The gains for text are also very nice, although in this case that only
happens for the smallest scale (1M rows), and for larger scales it's
actually slower than current master :-(

It's not just rainbows and unicorns, though. With both patches applied,
text sorts get even slower (up to ~8% slower than master), It also seems
to impact float (which gets ~5% slower, for some reason), but I don't
see how that could happen ... but I suspect this might be noise.

I'll repeat the tests on another machine after the weekend, and post an
update whether the results are the same or significantly different.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21

Gavin Flower

GavinFlower@archidevsys.co.nz

about 11 years ago

In reply to: Tomas Vondra (#20)

#22

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Gavin Flower (#21)

#23

pg@bowt.ie

about 11 years ago

In reply to: Tomas Vondra (#20)

#24

pg@bowt.ie

about 11 years ago

In reply to: Peter Geoghegan (#23)

#25

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Peter Geoghegan (#23)

#26

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Peter Geoghegan (#24)

#27

pg@bowt.ie

about 11 years ago

In reply to: Tomas Vondra (#26)

#28

pg@bowt.ie

about 11 years ago

In reply to: Peter Geoghegan (#27)

#29

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Peter Geoghegan (#27)

#30

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Gavin Flower (#21)

#31

Gavin Flower

GavinFlower@archidevsys.co.nz

about 11 years ago

In reply to: Andrew Gierth (#30)

#32

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Andrew Gierth (#30)

#33

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Tomas Vondra (#32)

#34

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Andrew Gierth (#33)

#35

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Tomas Vondra (#34)

#36

tomas.vondra@2ndquadrant.com

about 11 years ago

In reply to: Andrew Gierth (#35)

#37

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#33)

#38

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#35)

#39

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#38)

#40

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#39)

#41

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#40)

#42

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#40)

#43

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#40)

#44

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#42)

#45

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#41)

#46

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#45)

#47

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#44)

#48

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#46)

#49

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#43)

#50

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#41)

#51

pg@bowt.ie

about 11 years ago

In reply to: Robert Haas (#50)

#52

robertmhaas@gmail.com

about 11 years ago

In reply to: Peter Geoghegan (#49)

#53

robertmhaas@gmail.com

about 11 years ago

In reply to: Peter Geoghegan (#51)

#54

pg@bowt.ie

about 11 years ago

In reply to: Robert Haas (#52)

#55

pg@bowt.ie

about 11 years ago

In reply to: Robert Haas (#53)

#56

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#54)

#57

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#56)

#58

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#49)

#59

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#58)

#60

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#56)

#61

pg@bowt.ie

about 11 years ago

In reply to: Robert Haas (#60)

#62

robertmhaas@gmail.com

about 11 years ago

In reply to: Peter Geoghegan (#54)

#63

robertmhaas@gmail.com

about 11 years ago

In reply to: Peter Geoghegan (#61)

#64

pg@bowt.ie

about 11 years ago

In reply to: Robert Haas (#63)

#65

robertmhaas@gmail.com

about 11 years ago

In reply to: Peter Geoghegan (#64)

#66

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#43)

#67

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#66)

#68

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#67)

#69

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Robert Haas (#65)

#70

Kenneth Marshall

ktm@rice.edu

about 11 years ago

In reply to: Andrew Gierth (#56)

#71

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#69)

#72

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Peter Geoghegan (#71)

#73

pg@bowt.ie

about 11 years ago

In reply to: Andrew Gierth (#72)

#74

pg@bowt.ie

about 11 years ago

In reply to: Peter Geoghegan (#57)

#75

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#69)

#76

robertmhaas@gmail.com

about 11 years ago

In reply to: Robert Haas (#75)

#77

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#76)

#78

robertmhaas@gmail.com

about 11 years ago

In reply to: Petr Jelinek (#77)

#79

Petr Jelinek

petr@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#78)

#80

robertmhaas@gmail.com

about 11 years ago

In reply to: Petr Jelinek (#79)

#81

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Robert Haas (#75)

#82

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#81)

#83

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Robert Haas (#82)

#84

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#83)

#85

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Robert Haas (#84)

#86

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#85)

#87

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Robert Haas (#86)

#88

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Tom Lane (#87)

#89

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Robert Haas (#86)

#90

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andrew Gierth (#89)

#91

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#90)

#92

robertmhaas@gmail.com

about 11 years ago

In reply to: Andrew Gierth (#89)

#93

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Robert Haas (#91)

#94

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Tom Lane (#93)

#95

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Tom Lane (#90)

#96

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andrew Gierth (#95)

#97

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Tom Lane (#96)

#98

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#96)

#99

Andrew Dunstan

andrew@dunslane.net

about 11 years ago

In reply to: Tom Lane (#96)

#100

Andrew Dunstan

andrew@dunslane.net

about 11 years ago

In reply to: Robert Haas (#98)

#101

pg@bowt.ie

about 11 years ago

In reply to: Andrew Dunstan (#99)

#102

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Robert Haas (#98)

#103