GiST -- making my index faster makes is slower

Started by David Blasbyover 21 years ago8 messages
#1David Blasby
dblasby@refractions.net

I just tried an experiment that I thought would improve the performance
of the PostGIS spatial index.

The old way used a BOX (bounding box of 4 doubles) as the key in the index.

The new way uses a bounding box of 4 single-precision floats (its only
16 bytes long - a BOX is 32). I thought that it would significantly
reduce the size of the index (it did) and that the indexing would be
faster since there's less disk pages to fiddle through and these pages
would be more likely to fit in the disk cache.

It turns out this is not the case - its significantly slower. I think
it to do with more keys fitting in the leaf tuples.

As far as I can tell, the GiST index looks through the internal nodes,
then when it hits a leaf node, it search through each of the keys in the
leaf.

I save time searching through the internal nodes (because there's less
of them) - this is O(log n) saving.

I lose time searching through the leafs (because there's more keys in a
leaf) - this is O(n) cost.

For a query that does a nested loop of 10,000 index scans (on the same
table), the old method takes about 2 second, the new "faster" method
takes about 20 seconds.

I'm still trying to verify that this is the actual reason why its slower
- anyone have any other ideas? Anyone know a good way to profile this?

dave

#2David Blasby
dblasby@refractions.net
In reply to: David Blasby (#1)
Re: GiST -- making my index faster makes is slower

Couple of observations:

1. Are you sure your data is handled as 32 bit all the way through? Run
time casting will offset performance gains on 32 bit floats. Is your
comparison routine casting to double?

I thought this might be the case - but I thought it would be small. The
only place it might be doing hidden casts would be in statements like
"query->xmin <= key->xmax".

2. Math CPUs usually crunch at 80 bits, can't save much by using 32 bit
floats, although cache coherency will be better.

3. 64 bit cpu will probably run better on 64 bit floats.

4. Is your dataset congested enough that you now have duplicate values
by loss of precision? This would of course impact performance. How
big is your dataset? How big is your avg. result set?

My test datasets are quite spatially separate, so I dont expect there to
be any "accidental" overlaps. There's about 12 million rows (in total)
in the datasets - I've only noticed about 2 of these overlaps. My test
datasets are 1000 rows, 10,000 rows, 10,000,000 rows, and a few
different ones in the 200,000 row range.

I'm testing queries that return anywhere from 1 geometry to about 10,000.

The actual index search is only a few milliseconds longer using the
float32 bounding box for a mid-sized table returning a handfull of rows.
When the result sets get bigger (or you start doing nested queries),
the performance differences become greater.

dave

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Blasby (#1)
Re: GiST -- making my index faster makes is slower

David Blasby <dblasby@refractions.net> writes:

The old way used a BOX (bounding box of 4 doubles) as the key in the index.
The new way uses a bounding box of 4 single-precision floats (its only
16 bytes long - a BOX is 32).

It turns out this is not the case - its significantly slower. I think
it to do with more keys fitting in the leaf tuples.

Hm. With twice as many keys per page, you'd think that it could be no
more than 2x slower, if the problem is one of O(n) search through the
keys on the page. That doesn't explain a 10x slowdown. Could it be
that some part of the GIST code is O(n^2), or worse, in the number of
keys per page? If so, perhaps it's fixable.

I'd suggest profiling the backend with both key types to get an idea of
where the time is going.

regards, tom lane

PS: actually, allowing for the 12-byte index tuple overhead, you
couldn't have even twice as many keys per page. So there's something
mighty odd here. Keep us posted.

#4David Blasby
dblasby@refractions.net
In reply to: Tom Lane (#3)
Re: GiST -- making my index faster makes is slower

Tom Lane wrote:

I'd suggest profiling the backend with both key types to get an idea of
where the time is going.

I've been trying to use gprof to do some profiling, but I'm having
troubles. Whats the best way to profile?

PS: actually, allowing for the 12-byte index tuple overhead, you
couldn't have even twice as many keys per page. So there's something
mighty odd here. Keep us posted.

Using the old system, I'd get about 7 internal node hits and about 110
"leaf" hits. Under the new one, I get about 4 internal node hits and
about 160 "leaf" hits.

I'm just in the process of changing the key to:

typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
char junk[16]; // make the 16 byte type into 32!
} BOX2DFLOAT4;

To see what happens.

dave

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Blasby (#4)
Re: GiST -- making my index faster makes is slower

David Blasby <dblasby@refractions.net> writes:

Tom Lane wrote:

I'd suggest profiling the backend with both key types to get an idea of
where the time is going.

I've been trying to use gprof to do some profiling, but I'm having
troubles. Whats the best way to profile?

It's not hard; the only real gotcha is that on Linux systems you need to
compile with -DLINUX_PROFILE so that the postmaster works around some
Linux brain damage with dropping the profile status at fork().
I usually do (in an already configured and built source tree)

cd src/backend
make clean
make PROFILE="-pg -DLINUX_PROFILE" all
make install-bin

Don't forget that the backends will drop their gmon.out files in
$PGDATA/data/DBOID/gmon.out.

regards, tom lane

#6David Blasby
dblasby@refractions.net
In reply to: Tom Lane (#5)
Re: GiST -- making my index faster makes is slower

Humm -- strange results here:

typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
} BOX2DFLOAT4;

This takes about 18,000 ms to do a nested query with 10,000 iterations.

typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
char junk[16];
} BOX2DFLOAT4;

This takes about 15,000 ms to do a nested query with 10,000 iterations.

typedef struct
{
double xmin;
double ymin;
double xmax;
double ymax;
} BOX2DFLOAT4;

This takes about 1500 ms to do a nested query with 10,000 iterations.
Yes - that almost 14 seconds faster!

This doesnt make a lot of sense since the only part of the GiST index
thats being called that actually looks at the bounding boxes is this:

retval = (((key->xmax>= query->xmax) &&
(key->xmin <= query->xmax)) ||
((query->xmax>= key->xmax) &&
(query->xmin<= key->xmax)))
&&
(((key->ymax>= query->ymax) &&
(key->ymin<= query->ymax)) ||
((query->ymax>= key->ymax) &&
(query->ymin<= key->ymax)));

dave

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Blasby (#6)
Re: GiST -- making my index faster makes is slower

David Blasby <dblasby@refractions.net> writes:

Humm -- strange results here:
typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
} BOX2DFLOAT4;

This takes about 18,000 ms to do a nested query with 10,000 iterations.

typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
char junk[16];
} BOX2DFLOAT4;

This takes about 15,000 ms to do a nested query with 10,000 iterations.

That is strange --- I'm still suspecting an O(n^2) bit of logic
somewhere. But the size of the discrepancy is a little bit more
reasonable. Could you profile these two cases and see where the extra
time goes?

typedef struct
{
double xmin;
double ymin;
double xmax;
double ymax;
} BOX2DFLOAT4;

This takes about 1500 ms to do a nested query with 10,000 iterations.
Yes - that almost 14 seconds faster!

AFAICS there are only two possible explanations:

1. float-vs-float comparison is a lot slower than double-vs-double on
your hardware. Actually, the comparisons might be float-vs-double
(is the "query" struct the same as the "key"??). The compiler could
well be promoting one or both floats to double and then doing double
comparison, but even so, that's a heck of a large overhead.

2. The float bounding boxes are presumably slightly inaccurate. If they
are a tad too large then perhaps you are visiting more leaf pages than
you need to. (A worrisome possibility is that a float box could be
slightly too small, causing you to fail to visit an entry you should
:-()

regards, tom lane

#8David Blasby
dblasby@refractions.net
In reply to: Tom Lane (#5)
Re: GiST -- making my index faster makes is slower

I tracked the problem down to the "penalty" function used to build the
tree. Basically, it compares the area of the bounding boxes.

There wasnt enough precision in the area calculations - instead of
giving 0.0 it would give numbers in the[+-]10^-6 range. This really
screwed up how the tree was being built.

I cast the float to doubles when calculating area, and it all works well
now.

dave