GiST penalty functions [PoC]

Started by Andrey Borodinover 9 years ago28 messageshackers
Jump to latest
#1Andrey Borodin
amborodin@acm.org

Hi, hackers!

Some time ago I put a patch to commitfest that optimizes slightly GiST
inserts and updates. It's effect in general is quite modest (15% perf
improved), but for sorted data inserts are 3 times quicker. This
effect I attribute to mitigation of deficiencies of old split
algorithm.
Alexander Korotkov advised me to lookup some features of the RR*-tree.

The RR*-tree differs from combination of Gist + cube extension in two
algorithms:
1. Algorithm to choose subtree for insertion
2. Split algorithm

This is the message regarding implementation of first one in GiST. I
think that both of this algorithms will improve querying performance.

Long story short, there are two problems in choose subtree algorithm.
1. GiST chooses subtree via penalty calculation for each entry,
while RR*-tree employs complicated conditional logic: when there are
MBBs (Minimum Bounding Box) which do not require extensions, smallest
of them is chosen; in some cases, entries are chosen not by volume,
but by theirs margin.
2. RR*-tree algorithm jumps through entry comparation non-linearly,
I think this means that GiST penalty computation function will have to
access other entries on a page.

In this message I address only first problem. I want to construct a
penalty function that will choose:
1. Entry with a zero volume and smallest margin, that can
accommodate insertion tuple without extension, if one exists;
2. Entry with smallest volume, that can accommodate insertion tuple
without extension, if one exists;
3. Entry with zero volume increase after insert and smallest margin
increase, if one exists;
4. Entry with minimal volume increase after insert.

Current cube behavior inspects only 4th option by returning as a
penalty (float) MBB’s volume increase. To implement all 4 options, I
use a hack: order of positive floats corresponds exactly to order of
integers with same binary representation (I’m not sure this stands for
every single supported platform). So I use two bits of float as if it
were integer, and all others are used as shifted bits of corresponding
float: option 4 – volume increase, 3 - margin increase, 2 – MBB
volume, 1 – MBB margin. You can check the reinterpretation cast
function pack_float() in the patch.

I’ve tested patch performance with attached test. On my machine patch
slows index construction time from 60 to 76 seconds, reduces size of
the index from 85Mb to 82Mb, reduces time of aggregates computation
from 36 seconds to 29 seconds.

I do not know: should I continue this work in cube, or it’d be better
to fork cube?
Maybe anyone have tried already RR*-tree implementation? Science
papers show very good search performance increase.

Please let me know if you have any ideas, information or interest in this topic.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

cube_improved_penalty_v1.difftext/plain; charset=US-ASCII; name=cube_improved_penalty_v1.diffDownload+53-0
gistselecttest.sqlapplication/octet-stream; name=gistselecttest.sqlDownload
#2Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#1)
Re: GiST penalty functions [PoC]

Hi hackers!

Here is the new patch version.
With the help of Mikhail Bakhterev I've optimized subroutines of the
penalty function. Index build time now is roughly equivalent to build
time before patch (test attached to thread start).
Time of SELECT statement execution is reduced by 40%.
Changes in the patch:
1. Wrong usage of realms is fixed
2. Cube size and margin (edge) functions are optimized to reduce
memory write instructions count (result of these functions were
written on evey cycle of a loop)
3. All private functions are marked as static inline
4. Comments are formatted per project style

I'm going to put this to commitfest queue, because performance of gist
queries is improved significantly and I do not see any serious
drawbacks.

Any ideas about this patch are welcome. Especialy I'm conserned about
portability of pack_float function.
Does every supported Postgres platform conforms to IEEE 754 floating
point specification?

Also I'm not sure about possibility to hit any problems with float
NaNs during float package?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

cube_improved_penalty_v2.difftext/plain; charset=US-ASCII; name=cube_improved_penalty_v2.diffDownload+81-26
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrey Borodin (#2)
Re: GiST penalty functions [PoC]

Andrew Borodin <borodin@octonica.com> writes:

Does every supported Postgres platform conforms to IEEE 754 floating
point specification?

Possibly, but it is against project policy to allow code to assume so.
That pack_float function is absolutely not acceptable IMV, and would
still be if it were adequately documented, which it's far away from
being.

On general grounds, I would forget all the "inline"s. Modern compilers
are generally able to make their own decisions about it, and trying to put
your thumb on the scales this heavily is not likely to improve the code.

Haven't really read the patch, just responding to a couple points you
mentioned in the intro.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#3)
Re: GiST penalty functions [PoC]

Thank you for your coments, Tom.

Modern compilers are generally able to make their own decisions about it, and trying to put your thumb on the scales this heavily is not likely to improve the code.

Well, I have tested that combination of "static inline" affets
performance of index build on a scale of 5%. Though I didn't tested
with "static" only.
AFAIK compiler cannot prove that array of function input and output do
not intersect, so it emits lots of writes to output address inside
loop body.

That pack_float function is absolutely not acceptable

Well, possibly, there are ways to achive penalty optimization without
such hacks, but it failed for pg_shpere and in PostGIS code. They
reverted plain arithmetic optimization without bit hacks, becuase it
didn't worked. This one works.
There is other way: advance GiST API. But I'm not sure it is possible
to do so without breaking compatibility with many existing extensions.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#4)
Re: GiST penalty functions [PoC]

Here is new version of the patch.
Now function pack_float is commented better.
All inline keywords are removed. I haven't found any performance loss for that.
Remove of static's showed 1%-7% performance loss in SELECT computation
(3 test runs), so I left statics where they are.

Actually, to avoid this kind of hacks, probably, it would be better to
fork GiST to GiSTv2 and implement many API changes there:
1. Bulk load API
2. Richier choose subtree API
3. Allow for key test not just NO\MAYBE answers, but SURE answer to
skip key test for underlying tree
And some other improvements require breaking chanes for extensions.
GiST idea is almost 25, modern spatial indices vary a lot from things
that were there during 90th.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

cube_improved_penalty_v3.difftext/plain; charset=US-ASCII; name=cube_improved_penalty_v3.diffDownload+91-25
#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrey Borodin (#1)
Re: GiST penalty functions [PoC]

On 08/29/2016 09:04 AM, Andrew Borodin wrote:

In this message I address only first problem. I want to construct a
penalty function that will choose:
1. Entry with a zero volume and smallest margin, that can
accommodate insertion tuple without extension, if one exists;
2. Entry with smallest volume, that can accommodate insertion tuple
without extension, if one exists;
3. Entry with zero volume increase after insert and smallest margin
increase, if one exists;
4. Entry with minimal volume increase after insert.

Looking at the code, by "margin", you mean the sum of all "edges", i.e.
of each dimension, of the cube. I guess the point of that is to
differentiate between cubes that have the same volume, but a different
shape.

So, let's try to come up with a scheme that doesn't require IEEE 754
floats. Those above cases can be split into two categories, depending on
whether the new value has zero volume or not. We can use a completely
different scheme for the two cases, if we want, because when we're
choosing a target page to insert to, penalty gets called for every
original tuple, but with the same new tuple.

Here's a scheme I just came up with. There might be better ones, but
it's something.

Let's have:

oX: Original tuple's edge sum
nX: New tuple's edge sum
dX: Edge increase

oV: Original tuple's volume
nX: New tuple's volume
dX: Volume increase

Case A: New entry has zero volume.
------

Two sub-cases:
A1: if dE > 0, use dE. dE must be in the range [0, nE]
A2: otherwise, use oE.

So how do we cram those two into a single float?

If we offset case A1 by n, we can use the range between [0, nE] for A2.
Something like this pseudocode:

if (dE > 0)
return nE + dE; /* A1, offset dE into range [nE, inf] */
else
return nE - (nE/oE); /* A2, scale oE into range [0, nE] */

Case B: New entry has non-zero volume
------
B1: if dV > 0. use dV. dV must be in the range [0, nV].
B2: if dE > 0, use dE. dE must be in the range [0, nE].
B3: oV, otherwise.

By offsetting cases B1 and B2, we can again divide the range into three
parts, with pseudocode like:

if (dV > 0)
return nV + nE + dV; /* B1, offset dV into range [nV+nE, inf] */
else if (dE > 0)
return nV + dE; /* B2, offset dE into range [nV, nV+nE] */
else
return nV - (nV/oV) /* B3, scale oV into range [0, nV]

I�ve tested patch performance with attached test. On my machine patch
slows index construction time from 60 to 76 seconds, reduces size of
the index from 85Mb to 82Mb, reduces time of aggregates computation
from 36 seconds to 29 seconds.

Hmm. That's a pretty large increase in construction time. Have you done
any profiling on where the time goes?

I do not know: should I continue this work in cube, or it�d be better
to fork cube?

Should definitely continue work within cube, IMHO. I don't think forking
it to a new datatype would make this any easier.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andrey Borodin
amborodin@acm.org
In reply to: Heikki Linnakangas (#6)
Re: GiST penalty functions [PoC]

Hi Heikki!
Thank you for reviewing the code, it's always inspiring when a work is
noted (:

Looking at the code, by "margin", you mean the sum of all "edges", i.e. of

each dimension, of the cube.
Indeed. As far as I remember, this is a terminology of old R*-tree paper.
Now they use the word "perimeter", seems like it's closer for
English-speaking folks, so in future patches I'll switch to "perimeter"

I guess the point of that is to differentiate between cubes that have the

same volume, but a different shape.
<Not very sufficient R-tree details>
Somewhat like this. For an R[something]-tree volume is an ultimate measure
to compare figures, close to multidimensional cube. Tree must tend to form
MBRs with equal edges to ensure search optimizations spread across all
dimensions equally.
But on practice volume degrade quickly. When you index dots, you quickly
get a page with a "plane", lot's of dots with one dimensions on a fixed
value. And then R-tree do not work anymore, inserts are somewhat-randomized
by GiST, but in practice tree is a lot worse than randomized, due to
randomization skew towards early insert candidates.
Perimeter is not as good as volume in general case. But it does not degrade
until you have truly equal object MBRs.

So this was volume vs perimeter. For example in MySQL they use #define to
choose one strategy of these, which seems for me not a good design. In this
patch I propose to use perimeter when volume strategy degraded, effectively
when one dimension edge is 0 or some dimensions edges are close to epsilon.

But there is one more tie to break. When you choose subtree for insertion,
some candidates have zero volume extension and zero edge extension. They do
not need to be extended to accommodate new tuple. In this case we choose
smallest one, by volume. If all of them have 0 volume, we choose by
perimeter.

All in all, we have 4 different cases here, ordered by priority.
</Not very sufficient R-tree details>

So, let's try to come up with a scheme that doesn't require IEEE 754

floats.
1. Conversion for my algorithm to float ops. I actually haven't thought
about it before, but seems it's impossible. Bit shift makes no sense for
regular float operations, under any circumstances exponent bits cannot shit
to significant field and vise versa. If we do not move last bit of exponent
- all significant field bits are garbage, that leaves us with only 6 bits
for actual value, which is unacceptable.
2. Your algorithm, among loosing some bits of precision (which is
absolutely acceptable - we have like 31 of them and that’s a lot) rely on
false assumption. We compare tuples on page not by MBR of inserted tuple,
but by MBR of tuple on page, which is different for every penalty
calculation.
I'll put it in your var names, they are good to reason about proposed
algorithm.

oE: Original tuple's edge sum (perimeter measure). This tuple is on a page.
nE: New tuple's edge sum. This tuple is to be inserted. We calc penalty to
choose subtree with minimum penalty.
dE: Edge increase. Edge of union(original|new) minus oE.

oV: Original tuple's volume
nV: New tuple's volume
dV: Volume increase

Best desired algorithm: sort tuples by dV, then by dE, then by oV, then by
oE (all ascending) and pick top 1.
Unfortunately, I do not see a way to do that in 31 bit of float (in GiST we
cannot use sign bit, <=0 is used as a magic condition).
So implemented is following:
Say we have number ALOT which is guaranteed to be bigger than dV,dE,oV,oE.
if( dV > 0 )
penalty = ALOT*3 + dV;//nonzero dV goes last
elif ( dE > 0 )
penalty = ALOT*2 + dE;//than goes dE
elif ( oV > 0 )
penalty = ALOT + oV
then
penalty = oE;
//oE is zero for subtrees containing only equal points. In this case we
pass split optimization possibility to next GiST attribute

Volume and perimeter of new entry does not affect choice of subtree
directly, whether it is zero or not.

Hmm. That's a pretty large increase in construction time. Have you done

any profiling on where the time goes?
Since I've changes only one function, I didn't consider profiling. I've
asked Michael to check disassembly of the function call tree, implemented
his recommendations and patch v2 have all build performance back (: And now
it computes aggregates 40% faster.

continue work within cube

There is one more option: implement extension index using CREATE ACCESS
METHOD feature of 9.6. It is an option: we can skip all GiST API hacks and
implement real RR*-tree, there are some other improvements.
But, on the other hand, improvement of GiST API could be beneficial to
other extensions.

Again, thanks for your attention. Possibly we can come up with some
solution without dirty hacks.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrey Borodin (#7)
Re: GiST penalty functions [PoC]

On 09/07/2016 09:42 AM, Andrew Borodin wrote:

2. Your algorithm, among loosing some bits of precision (which is
absolutely acceptable - we have like 31 of them and that’s a lot) rely on
false assumption. We compare tuples on page not by MBR of inserted tuple,
but by MBR of tuple on page, which is different for every penalty
calculation.

The penalty function has two arguments: the new tuple, and the existing
tuple on the page. When we're inserting, it gets called for every
existing tuple on the page, with the same new tuple. And then we compare
the returned penalties. So for a single insertion, the "new tuple"
argument stays the same for every call.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andrey Borodin
amborodin@acm.org
In reply to: Heikki Linnakangas (#8)
Re: GiST penalty functions [PoC]

Oh, sorry, made one more attemp and now I see your algorithm differently.

So you propose to use oE and oV as a markers of borders for what I call Realm.
But there may be very little significant bits in one of this ranges.
pg_sphere and PostGiS extensions tried to use 1 as a marker, with
alike formula. And that generated not a good tree.
Here's how they done it
https://github.com/postgis/postgis/commit/9569e898078eeac8928891e8019ede2cbf27d06f

I'll try to compose a patch for your formula later, I think we should
just try and see, it is easier than to reason about digital floating
points.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#9)
Re: GiST penalty functions [PoC]

Well, arithmetics is too fragile.

This version of float packing with arithmetical packaging
static float
pack_float(float actualValue, int realm)
{
double max,min;
max = FLT_MAX / ( 8 >> realm );
min = FLT_MAX / ( 16 >> realm );
if( realm == 0 )
min = 0;
/* squeeze the actual value between min and max */
return ( min + (actualValue * ( max - min ) / FLT_MAX));
}

Inserts are the same as of bithacked pack_float, but selects are 5 times slower.
When we are trying to pack value linearly into range we loose too much bits.
Here is how it happens: in floats addition of small values to big
values causes loss of small values.
Applied to Heikki's algorithm this means that nV, nE and dV can all be
in different mantissa ranges. (Please note that dV ca be many times
more than nV and many times smaller that nV)

Integer bitshift of a float have no arithmetical meaning. It would be
hard to describe in formulas what carring of mantissa bit to
significant field mean. But this bitshift preserves an order among
almost all float values (except 2 controllably lost bits and some
values become sNaN ). Entire idea of the advanced GiST penalty stands
on this.

The other way is to add to API optional handler which executes all of
choose subtree algorithm inside cube (or other extension).

Best regards, Andrey Borodin, Octonica & Ural Federal University.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrey Borodin (#10)
Re: GiST penalty functions [PoC]

On 09/07/2016 09:20 PM, Andrew Borodin wrote:

Well, arithmetics is too fragile.

This version of float packing with arithmetical packaging
static float
pack_float(float actualValue, int realm)
{
double max,min;
max = FLT_MAX / ( 8 >> realm );
min = FLT_MAX / ( 16 >> realm );
if( realm == 0 )
min = 0;
/* squeeze the actual value between min and max */
return ( min + (actualValue * ( max - min ) / FLT_MAX));
}
Inserts are the same as of bithacked pack_float, but selects are 5 times slower.
When we are trying to pack value linearly into range we loose too much bits.

That's why I suggested scaling it by the new value's volume and/or
edge-sum. I was hoping that the old and new values are roughly of the
same magnitude, so that it would work out. I guess not.

But we could probably something like the above too, if we use
logarithmic or exponential, rather than linear, packing. Something like:

static float
pack_float(float actualValue, int realm)
{
double val;

val = sqrt(sqrt(actualValue));

if (realm == 0)
return actualvalue;
if (realm == 1)
return actualValue * sqrt(sqrt(FLT_MAX));
if (realm == 2)
return actualValue * sqrt(FLT_MAX);
}

Unfortunately, sqrt(x) isn't very cheap.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#11)
Re: GiST penalty functions [PoC]

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Unfortunately, sqrt(x) isn't very cheap.

You'd be surprised: sqrt is built-in on most modern hardware. On my
three-year-old workstation, sqrt(x) seems to take about 2.6ns. For
comparison, the pack_float version posted in
<CAJEAwVGdb92E-XKfMLN3cxM2BWbbA3rrffzDzg8Ki1H5iQEk2Q@mail.gmail.com>
takes 3.9ns (and draws multiple compiler warnings, too).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Tom Lane (#12)
Re: GiST penalty functions [PoC]

Excuse me for intervention.

It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which
does not have sqrt support.

But you are right, it is supported in most cases. And if execution speed
of this very fuction is of concern, sqrtf(x) should be used instead of
sqrt(x).

Despite this, Andrew's solution gives more accurate representation of
values. And as far as i understand, this improves overall performance by
decreasing the overall amount of instructions, which must be executed.

It is possible to speed up Andrew's implementation and get rid of
warnings by using bit-masks and unions. Something like:

union {
float f;
struct {
unsigned int mantissa:23, exponent:8, sign:1;
} bits;
}

I am sorry, i have no time to check this. But it is common wisdom to
avoid pointer-based memory accesses in high-performance code, as they
create a lot of false write-to-read dependencies.

- Mikhail, respectfully

Show quoted text

On Wed, Sep 07, 2016 at 05:58:42PM -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Unfortunately, sqrt(x) isn't very cheap.

You'd be surprised: sqrt is built-in on most modern hardware. On my
three-year-old workstation, sqrt(x) seems to take about 2.6ns. For
comparison, the pack_float version posted in
<CAJEAwVGdb92E-XKfMLN3cxM2BWbbA3rrffzDzg8Ki1H5iQEk2Q@mail.gmail.com>
takes 3.9ns (and draws multiple compiler warnings, too).

regards, tom lane

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Михаил Бахтерев (#13)
Re: GiST penalty functions [PoC]

On 09/08/2016 09:39 AM, Михаил Бахтерев wrote:

Excuse me for intervention.

It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which
does not have sqrt support.

But you are right, it is supported in most cases. And if execution speed
of this very fuction is of concern, sqrtf(x) should be used instead of
sqrt(x).

Despite this, Andrew's solution gives more accurate representation of
values. And as far as i understand, this improves overall performance by
decreasing the overall amount of instructions, which must be executed.

BTW, I would be OK with the bit-twiddling hack, if we had an autoconf
check for IEEE 754 floats, and a graceful fallback for other systems.
The fallback could be simply the current penalty function. You wouldn't
get the benefit from the better penalty function on non-IEEE systems,
then, but it would still be correct.

It is possible to speed up Andrew's implementation and get rid of
warnings by using bit-masks and unions. Something like:

union {
float f;
struct {
unsigned int mantissa:23, exponent:8, sign:1;
} bits;
}

I am sorry, i have no time to check this. But it is common wisdom to
avoid pointer-based memory accesses in high-performance code, as they
create a lot of false write-to-read dependencies.

The compiler should be smart enough to generate the same instructions
either way. A union might be more readable, though. (We don't need to
extract the mantissa, exponent and sign, so a union of float and int32
would do.)

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andrey Borodin
amborodin@acm.org
In reply to: Heikki Linnakangas (#14)
Re: GiST penalty functions [PoC]

autoconf check for IEEE 754 floats

Autoconf man says folowing:

it is safe to assume IEEE-754 in most portable code these days

https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability

A union might be more readable

Here is union version of the patch. It's slower 10% than original cube
and dereference version. Have no idea why.
Select performance is improved as in v3.

Also I've investigated a bit why linear package failed. It's usefull
to think in terms of bijections, not in terms of arithmetic functions.

float 1 is 1065353216 hex 3f800000
FLT_MAX / ( 8 >> 3 ) is 2139095039 hex 7f7fffff
FLT_MAX / ( 8 >> 2 ) is 2130706431 hex 7effffff
FLT_MAX / ( 8 >> 1 ) is 2122317823 hex 7e7fffff
FLT_MAX / ( 8 >> 0 ) is 2113929215 hex 7dffffff

This realm borders are completly wrong.
That maens I used 800 thousands values to pack 2billions of values of
realms 1-3, and all other 2.1 bils of values to pack realm 0. Fragile
arithmetics was done wrong.

What I had to use was
Realm 3
INT32_MAX is nan (or something little less than nan)
3*INT32_MAX/4 is 3.689348e+19
Total little less than 512kk different values

Realm 2
3*INT32_MAX/4 is 3.689348e+19
INT32_MAX/2 is 2.000000e+00
Total 512kk different values

Realm 1
INT32_MAX/2 is 2.000000e+00
INT32_MAX/4 is 1.084202e-19
Total 512kk different values

Realm 0
INT32_MAX/4 is 1.084202e-19
0 is 0.000000e+00
Total 512kk different values

Though I hadn't tested it yet. I'm not sure, BTW, hardcoding this
consts is a good idea.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

cube_improved_penalty_v4.difftext/plain; charset=US-ASCII; name=cube_improved_penalty_v4.diffDownload+95-24
#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#14)
Re: GiST penalty functions [PoC]

Heikki Linnakangas <hlinnaka@iki.fi> writes:

BTW, I would be OK with the bit-twiddling hack, if we had an autoconf
check for IEEE 754 floats, and a graceful fallback for other systems.

I would still object to the version submitted, on the grounds of the
compiler warnings it causes. Possibly those could be avoided with
a union, though; Float4GetDatum doesn't produce such warnings.

BTW, would someone explain to me why using a float here will not
fail catastrophically for inputs outside the range of float?
Cubes are based on doubles, not floats.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#16)
Re: GiST penalty functions [PoC]

BTW, would someone explain to me why using a float here will not fail catastrophically for inputs outside the range of float?

Indeed, it will. And that's one of the reason I'm considering
advancing GiST API instead of advancing extensions.

It won't crash, but will produce terrible index, not suitable of
performing efficient queries.
GiST treats penalty this way:
/* disallow negative or NaN penalty */
if (isnan(penalty) || penalty < 0.0)
penalty = 0.0;

Any NaN is a "perfect fit".

Calculation of real (say float) penalty and choice of subtree with
smallest penalty is an idea from 92. Modern spatial index requires
more than just a float to compare subtrees.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andrey Borodin (#15)
Re: GiST penalty functions [PoC]

If you are still interested in. Here are 3 versions of pack-float. The
union version of pack-float should run faster. The code is simpler, the
dependencies are easier.

But it may be less accurate or even wrong, as for signed integers (x>>2)
and (x/4) are not the same. Consider x = -1.

You may try pack_float_good, which gives the same asm as v3, but without
warnings.

- Mikhail, respectfully

Show quoted text

On Thu, Sep 08, 2016 at 01:29:36PM +0500, Andrew Borodin wrote:

autoconf check for IEEE 754 floats

Autoconf man says folowing:

it is safe to assume IEEE-754 in most portable code these days

https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability

A union might be more readable

Here is union version of the patch. It's slower 10% than original cube
and dereference version. Have no idea why.
Select performance is improved as in v3.

Attachments:

pack-float.ctext/x-c; charset=us-asciiDownload
pack-float.stext/x-asm; charset=us-asciiDownload
#19Andrey Borodin
amborodin@acm.org
In reply to: Михаил Бахтерев (#18)
Re: GiST penalty functions [PoC]

Thank you for your attention to details, Mikhail.

pack_float_good() looks good. But I'm not sure inline strict init is allowed under ansi C. Converting to regular ancient form b.fp = v; won't change compile result, would it?

Regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andrey Borodin (#19)
Re: GiST penalty functions [PoC]

Yes. You are right, ANSI C allows only load-time initializers. Attached
ANSI compatible version leads to the same assembly.

And let me suggest a bit-twiddling version as well. It gives 12
instructions, instead of 13. 12 is better, as modern x86 CPU will fetch
them at most in 3 cycles, one less than for 13 instructions. Also this
bit-twiddling is more parallel at instruction level.

And for ARM, which is unsurpassed at bit-twiddling this code is a way
better.

Of course speed is influenced by a lot of factors as always, so it needs
to be tested on some datasets.

- Mikhail, respectfully

Show quoted text

On Fri, Sep 09, 2016 at 08:50:53AM +0500, Andrey Borodin wrote:

Thank you for your attention to details, Mikhail.

pack_float_good() looks good. But I'm not sure inline strict init is allowed under ansi C. Converting to regular ancient form b.fp = v; won't change compile result, would it?

Regards, Andrey Borodin.

Attachments:

pack-float.ctext/x-c; charset=us-asciiDownload
pack-float.stext/x-asm; charset=us-asciiDownload
pack-float-arm.stext/x-asm; charset=us-asciiDownload
#21Bruce Momjian
bruce@momjian.us
In reply to: Andrey Borodin (#15)
#22Andrey Borodin
amborodin@acm.org
In reply to: Bruce Momjian (#21)
#23Andrey Borodin
amborodin@acm.org
In reply to: Bruce Momjian (#21)
#24Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#23)
#25Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrey Borodin (#24)
#26Andrey Borodin
amborodin@acm.org
In reply to: Heikki Linnakangas (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#26)
#28Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#27)