Performance optimization of btree binary search

Started by Peter Geogheganover 12 years ago28 messageshackers

pg@bowt.ie

over 12 years ago

Having nothing better to do over the holiday weekend, I decided to
pursue a number of ideas for improving performance that I thought
about a long time ago. These include:

* Pre-fetching list node pointers. This looks to be moderately
promising, but I'm certainly not going to be the one to land it, given
present obligations. Stephen Frost may wish to pick it up, given his
previous interest in the matter. This is slightly controversial,
because it uses a GCC intrinsic (__builtin_prefetch), but also because
the Linux kernel removed this optimization to their generic list
data-structure [1]http://lwn.net/Articles/444336/. However, that list was what we'd call an embedded
list, so we should probably shouldn't be totally deterred. The amount
of effort that I put into this was, frankly, pretty low. A motivated
person, willing to do the appropriate analysis could probably bring it
further. For one thing, every single foreach() has a call to this
intrinsic, even where the list doesn't store pointers (which is not
undefined). At the very least that's going to bloat things up,
frequently for no conceivable gain, and yet with the patch applied
we're still able to see see quite tangible benefits, even if it isn't
exactly a stellar improvement. I have an idea that prefetching the
last element at the start of the loop could be much better than what I
did, because we know that those lists are mostly pretty small in
practice, and that's likely to help pipelining - prefetching too late
or even too early makes the optimization useless, because you may
still get a cache miss.

* Optimizing index scans - I noticed that binary searching accounted
for many cache misses during a pgbench select.sql benchmark,
instrumented with "perf record -e cache-misses". This warranted
further investigation.

I won't say anything further about the former optimization, except to
note that it's included for comparative purposes in the set of
benchmarks I've run (I haven't included a patch). The benchmark
results are here:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/results

I took two approaches to the latter. This was the more interesting
piece of work. Test sets include:

* Master baseline (green)

* List optimization (as mentioned above, not really relevant to the
main topic of this mail) (red)

* "fib btree", earlier patch, please disregard (blue)

* "Fixed fib patch", Fibonacci search, no specialization (purple)

* The interesting one, Finonacci search + specialization - "fib + no
jump" (turquoise, see below for details)

Initially, I had a little bit of success with Fibonnacci search [2]http://www.mpri.lsu.edu/textbook/Chapter5-a.htm#fibonacci in
place of binary search, in the hope that it would better take
advantage of CPU cache characteristics - Fibonnacci search is said to
have advantages where non-uniform memory access is an issue - it
minimizes the final interval. I wasn't all that optimistic that it
would work that well given the smallish size of BLCKSZ relative to
modern CPU L1 cache sizes [3]http://www.pvk.ca/Blog/2012/07/30/binary-search-is-a-pathological-case-for-caches/ -- Peter Geoghegan, but it did make an appreciable dent on
its own. I suppose old habits die hard, because next I hacked up
_bt_compare and had it do an int4btcmp directly, in the event of
encountering a scankey that had as its comparator the relevant pg_proc
oid. This is very much in the style (and the spirit) of the grotty
early draft patches for the inlining-comparators-for-sorting patch.
Patch is attached. This is a draft, a POC, posted only to facilitate
discussion and to allow my results to be independently
duplicated/verified. Note that there is a bug (attributable to the new
search code) that causes the regression tests to fail in exactly one
place (i.e. one line of difference). I didn't think it was worth
deferring discussion to deal with that, though, since I don't think it
undermines anything.

I'm not sure how valuable the comparator trick is if we stick with
binary search - I didn't test that. I'm sure it's a question that must
be considered, though.

I have a fairly huge amount of data here, having run plenty of
benchmarks over several nights. The short version is that the 'select'
benchmark has just over 18% greater throughput on this machine at some
client counts (in particular, when there are 4 clients - there are 4
cores, but 8 logical cores) with the attached patch. There is a 3.5%
regression with one client, which is certainly not accounted for by
noise. Note, however, that latency appears consistently better with
the patch applied. This is a machine running on dedicated hardware: a
4-core i7-4770. The working set easily fits in its 32GiB of DDR3 RAM
at all pgbench scales tested (1, 10 and 100). The kernel used is
"3.8.0-31-generic #46~precise1-Ubuntu SMP". Postgres settings are
typical for this kind of thing (6GiB shared_buffers), but you can
refer to my pgbench-tools results for full details (drill down to an
individual pgbench run for that - they're all the same). I'm kind of
curious as to what this benchmark would like like on a server with
many more cores.

I guess I could write a proper patch to have code setting up a scankey
also set a flag that indicated that it was acceptable to assume that
the special built-in comparator would do fine. I don't think that
anything as involved as the sortsupport infrastructure is required
here, because that presumed (perhaps questionably) that it was widely
useful to provide more direct comparators, and that an extensible
framework was warranted, whereas what I've done is likely to have very
limited type-applicability. In general, people don't often put indexes
on floating point columns (where we need to handle NaN comparisons in
a special way in order to provide behavior consistent with various
invariants that btree operator classes need to respect). But I'm
reasonably confident that the majority of primary indexes in the wild
are int4 or composites of int4, maybe even the large majority.

I'd be happy with a scheme with only one built-in comparator, and
allowed a few types to be cataloged such that it was indicated that
just using the "built-in" comparator was acceptable, knowledge that
could be passed to _bt_compare via the scankey. I'm thinking of just
int4, and maybe date and a few other such int4 "covariant" types.

I don't think that this has to be morally equivalent to violating the
btree/AM abstraction by giving it direct knowledge of types - rather,
I'm only proposing that we give it the facility to know that for the
purposes of scankey comparison, a given type (well, some scankey that
has been appropriately hinted by the index Relation or something like
that) has Datums that compare like int4 Datums. Theoretically, it has
nothing in particular to do with the cataloged int4 type as such - it
has more to do with the fact that a comparison of 32-bit integers is a
single instruction on popular ISAs. More practically speaking, this
optimization seems likely to appreciably help many common cases.
Obviously, I'm not proposing that it be committed more or less as-is,
since it's entirely possible that the applicability of what I've done
isn't universal; I don't presume that it's never counter-productive.
How this could be most usefully applied is a discussion well worth
having, though. To be frank: hopefully we won't have a protracted
discussion this time. This stuff is not a high priority for me these
days.

Having mentioned CPU instructions, I should also say: as with the
sorting stuff, this optimization has little to nothing to do with
shaving some instructions from not having to go through the fmgr
trampoline. The true cost for that trampoline is almost certainly paid
in cache misses and, perhaps to a lesser extent, missed opportunities
for successful branch prediction.

Thoughts?

[1]: http://lwn.net/Articles/444336/

[2]: http://www.mpri.lsu.edu/textbook/Chapter5-a.htm#fibonacci

[3]: http://www.pvk.ca/Blog/2012/07/30/binary-search-is-a-pathological-case-for-caches/ -- Peter Geoghegan
--
Peter Geoghegan

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Peter Geoghegan (#1)

Re: Performance optimization of btree binary search

Peter Geoghegan <pg@heroku.com> writes:

I guess I could write a proper patch to have code setting up a scankey
also set a flag that indicated that it was acceptable to assume that
the special built-in comparator would do fine. ...
I'd be happy with a scheme with only one built-in comparator, and
allowed a few types to be cataloged such that it was indicated that
just using the "built-in" comparator was acceptable, knowledge that
could be passed to _bt_compare via the scankey. I'm thinking of just
int4, and maybe date and a few other such int4 "covariant" types.

If what you're proposing is that we have a fast path that compares Datums
as Datums, I should think that that would work fine for int2 as well,
*and* for int8 on machines where int8 is pass-by-value. (Does anyone
still care much about PG's performance on 32-bit hardware?) We might
have to fool a bit with the fooGetDatum macros in some cases, eg
I think Int16GetDatum isn't careful about sign extension. Admittedly,
that might introduce an offsetting cost on some hardware, but I think
on most machines sign-extension isn't noticeably more expensive than
zero-extension.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#2)

Re: Performance optimization of btree binary search

On Wed, Dec 4, 2013 at 4:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@heroku.com> writes:

I guess I could write a proper patch to have code setting up a scankey
also set a flag that indicated that it was acceptable to assume that
the special built-in comparator would do fine. ...
I'd be happy with a scheme with only one built-in comparator, and
allowed a few types to be cataloged such that it was indicated that
just using the "built-in" comparator was acceptable, knowledge that
could be passed to _bt_compare via the scankey. I'm thinking of just
int4, and maybe date and a few other such int4 "covariant" types.

If what you're proposing is that we have a fast path that compares Datums
as Datums, I should think that that would work fine for int2 as well,
*and* for int8 on machines where int8 is pass-by-value. (Does anyone
still care much about PG's performance on 32-bit hardware?) We might
have to fool a bit with the fooGetDatum macros in some cases, eg
I think Int16GetDatum isn't careful about sign extension. Admittedly,
that might introduce an offsetting cost on some hardware, but I think
on most machines sign-extension isn't noticeably more expensive than
zero-extension.

Yeah, I think if we can make something like this work, it would be
neat-o. Getting this working for int4 would be a good win, as Peter
says, but getting it working for both int4 and int8 with the same code
would be a significantly better one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@bowt.ie

over 12 years ago

In reply to: Robert Haas (#3)

Re: Performance optimization of btree binary search

On Wed, Dec 4, 2013 at 1:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, I think if we can make something like this work, it would be
neat-o. Getting this working for int4 would be a good win, as Peter
says, but getting it working for both int4 and int8 with the same code
would be a significantly better one.

No arguments here. I think I didn't initially suggest it myself out of
passing concern about the guarantees around how unused Datum bits are
initialized in all relevant contexts, but having looked at it for a
second I see that we are of course disciplined there.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Peter Geoghegan (#4)

Re: Performance optimization of btree binary search

Peter Geoghegan <pg@heroku.com> writes:

On Wed, Dec 4, 2013 at 1:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, I think if we can make something like this work, it would be
neat-o. Getting this working for int4 would be a good win, as Peter
says, but getting it working for both int4 and int8 with the same code
would be a significantly better one.

No arguments here. I think I didn't initially suggest it myself out of
passing concern about the guarantees around how unused Datum bits are
initialized in all relevant contexts, but having looked at it for a
second I see that we are of course disciplined there.

Hm ... actually, the comment at lines 335ff of postgres.h points out that
a Datum returned from a version 0 user-defined function might contain
garbage in the high order bits. We could fix that, probably, with some
cleanup code added to fmgr_oldstyle. It'd waste a few cycles ... but if
there's anybody out there who still cares about the performance of such
functions, it's high time they fixed them to be v1, anyway.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Peter Geoghegan (#4)

Re: Performance optimization of btree binary search

On Wed, Dec 4, 2013 at 6:33 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Dec 4, 2013 at 1:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, I think if we can make something like this work, it would be
neat-o. Getting this working for int4 would be a good win, as Peter
says, but getting it working for both int4 and int8 with the same code
would be a significantly better one.

No arguments here. I think I didn't initially suggest it myself out of
passing concern about the guarantees around how unused Datum bits are
initialized in all relevant contexts, but having looked at it for a
second I see that we are of course disciplined there.

Hmm. And yet, there's this:

* When a type narrower than Datum is stored in a Datum, we place it in the
* low-order bits and are careful that the DatumGetXXX macro for it discards
* the unused high-order bits (as opposed to, say, assuming they are zero).
* This is needed to support old-style user-defined functions, since depending
* on architecture and compiler, the return value of a function returning char
* or short may contain garbage when called as if it returned Datum.

And record_image_eq does a rather elaborate dance around here, calling
the appropriate GET_x_BYTES macro depending on the type-width. If we
can really count on the high-order bits to be zero, that's all
completely unnecessary tomfoolery.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#6)

Re: Performance optimization of btree binary search

Robert Haas <robertmhaas@gmail.com> writes:

Hmm. And yet, there's this:

* When a type narrower than Datum is stored in a Datum, we place it in the
* low-order bits and are careful that the DatumGetXXX macro for it discards
* the unused high-order bits (as opposed to, say, assuming they are zero).
* This is needed to support old-style user-defined functions, since depending
* on architecture and compiler, the return value of a function returning char
* or short may contain garbage when called as if it returned Datum.

And record_image_eq does a rather elaborate dance around here, calling
the appropriate GET_x_BYTES macro depending on the type-width. If we
can really count on the high-order bits to be zero, that's all
completely unnecessary tomfoolery.

Yeah, that's another thing we could simplify if we fixed this problem
at the source. I think these decisions date from a time when we still
cared about the speed of fmgr_oldstyle.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Tom Lane (#7)

Re: Performance optimization of btree binary search

I wrote:

Yeah, that's another thing we could simplify if we fixed this problem
at the source. I think these decisions date from a time when we still
cared about the speed of fmgr_oldstyle.

BTW, the text you're quoting is from 2007, but it's just documenting
behavior that's mostly a lot older. It's worth reading commit 23a41573
in toto in this connection. I'm not sure if we'd want to revert that
DatumGetBool change or not, if we were to clean up fmgr_oldstyle.
We'd be able to do whatever was cheapest, but I'm not sure what that is.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#7)

Re: Performance optimization of btree binary search

On Wed, Dec 4, 2013 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Hmm. And yet, there's this:

* When a type narrower than Datum is stored in a Datum, we place it in the
* low-order bits and are careful that the DatumGetXXX macro for it discards
* the unused high-order bits (as opposed to, say, assuming they are zero).
* This is needed to support old-style user-defined functions, since depending
* on architecture and compiler, the return value of a function returning char
* or short may contain garbage when called as if it returned Datum.

And record_image_eq does a rather elaborate dance around here, calling
the appropriate GET_x_BYTES macro depending on the type-width. If we
can really count on the high-order bits to be zero, that's all
completely unnecessary tomfoolery.

Yeah, that's another thing we could simplify if we fixed this problem
at the source. I think these decisions date from a time when we still
cared about the speed of fmgr_oldstyle.

Sure, let's whack that thing with a crowbar.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Peter Eisentraut

peter_e@gmx.net

over 12 years ago

In reply to: Robert Haas (#9)

Re: Performance optimization of btree binary search

On Wed, 2013-12-04 at 19:45 -0500, Robert Haas wrote:

On Wed, Dec 4, 2013 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, that's another thing we could simplify if we fixed this problem
at the source. I think these decisions date from a time when we still
cared about the speed of fmgr_oldstyle.

Sure, let's whack that thing with a crowbar.

Or just remove it. Who still needs it?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Peter Eisentraut (#10)

Re: Performance optimization of btree binary search

Peter Eisentraut <peter_e@gmx.net> writes:

On Wed, 2013-12-04 at 19:45 -0500, Robert Haas wrote:

On Wed, Dec 4, 2013 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, that's another thing we could simplify if we fixed this problem
at the source. I think these decisions date from a time when we still
cared about the speed of fmgr_oldstyle.

Sure, let's whack that thing with a crowbar.

Or just remove it. Who still needs it?

Lazy people? I'm not in a hurry to drop it; it's not costing us much to
just sit there, other than in this connection which we see how to fix.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Peter Geoghegan

pg@bowt.ie

over 12 years ago

In reply to: Peter Geoghegan (#1)

Re: Performance optimization of btree binary search

On Wed, Dec 4, 2013 at 12:58 PM, Peter Geoghegan <pg@heroku.com> wrote:

I'm kind of
curious as to what this benchmark would like like on a server with
many more cores.

I'm also curious about the impact on insertion into primary key
indexes. Presently, we hold an exclusive buffer lock for the duration
of a couple of operations when checkUnique != UNIQUE_CHECK_NO.
_bt_binsrch() is one such operation. The other one there,
_bt_check_unique(), is likely to be a lot cheaper than _bt_binsrch()
on average, I think, so I'm cautiously optimistic that it'll be
noticeable. I better go and check it out.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Peter Geoghegan

pg@bowt.ie

over 12 years ago

In reply to: Peter Geoghegan (#12)

Re: Performance optimization of btree binary search

On Wed, Dec 4, 2013 at 5:28 PM, Peter Geoghegan <pg@heroku.com> wrote:

I'm also curious about the impact on insertion into primary key
indexes. Presently, we hold an exclusive buffer lock for the duration
of a couple of operations when checkUnique != UNIQUE_CHECK_NO.
_bt_binsrch() is one such operation. The other one there,
_bt_check_unique(), is likely to be a lot cheaper than _bt_binsrch()
on average, I think, so I'm cautiously optimistic that it'll be
noticeable. I better go and check it out.

Depending on how well this goes, I might also teach _bt_doinsert() to
hint to _bt_binsrch() (or as I'm calling it, _bt_page_search()) that
it should look to the end of the page when searching, using a similar
mechanism to the mechanism for hinting that the main Datum-compare
optimization is applicable (this strategy would be abandoned if it
didn't work immediately - as soon as the last item on the page turned
out to be greater than or equal to the scankey value). This is
something that I think would help with SERIAL columns, where it's
possible in principle to pass that kind of insight around -- if you
can live with making SERIAL more than mere syntactic sugar.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Peter Eisentraut

peter_e@gmx.net

over 12 years ago

In reply to: Tom Lane (#11)

Re: Performance optimization of btree binary search

On Wed, 2013-12-04 at 20:27 -0500, Tom Lane wrote:

Lazy people? I'm not in a hurry to drop it; it's not costing us much to
just sit there, other than in this connection which we see how to fix.

Actually, I think it probably costs a fair portion of extension authors
when their initial code crashes because they forgot to declare all their
functions V1. I think it might actually be more of a bother to lazy
people than a benefit.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Peter Eisentraut (#14)

Re: Performance optimization of btree binary search

Peter Eisentraut <peter_e@gmx.net> writes:

On Wed, 2013-12-04 at 20:27 -0500, Tom Lane wrote:

Lazy people? I'm not in a hurry to drop it; it's not costing us much to
just sit there, other than in this connection which we see how to fix.

Actually, I think it probably costs a fair portion of extension authors
when their initial code crashes because they forgot to declare all their
functions V1. I think it might actually be more of a bother to lazy
people than a benefit.

Hm. We have heard one or two complaints like that, but not a huge number.

I'm worried about breaking code that's been working since god-knows-when;
but I will concede there's little evidence that there's very much of that
out there either.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Peter Geoghegan

pg@bowt.ie

over 12 years ago

In reply to: Peter Geoghegan (#12)

Re: Performance optimization of btree binary search

On Wed, Dec 4, 2013 at 5:28 PM, Peter Geoghegan <pg@heroku.com> wrote:

I'm also curious about the impact on insertion into primary key
indexes. Presently, we hold an exclusive buffer lock for the duration
of a couple of operations when checkUnique != UNIQUE_CHECK_NO.
_bt_binsrch() is one such operation. The other one there,
_bt_check_unique(), is likely to be a lot cheaper than _bt_binsrch()
on average, I think, so I'm cautiously optimistic that it'll be
noticeable. I better go and check it out.

I did a relatively short variant 'insert' pgbench-tools benchmark,
with a serial primary index created on the pgbench_history table.
Results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/insert/

Note that in the interest of increasing the signal to noise ratio, I
used unlogged tables. These are considerably shorter two minute runs,
of inserts/writes, but even still it's interesting to compare it to my
original 'select' benchmark. Checkpoint settings were very high.

This looks to be about a wash. I'm not that surprised, because this is
all about memory bandwidth, not lock contention. Even with the lock
contention on the rightmost btree page, we top out at about the same
TPS level as the 'select' benchmark (at least compared to Postgres
master). However, we top out at that level much sooner here (growth is
much steeper), because the contention is mostly on the same one or two
btree leaf pages, which are well cached by CPU L1 cache. Writes are
actually quite a bit faster than uniformly-distributed reads, even
with the exclusive buffer locking (and lock contention) necessitated
by writing. You might even say that my patch makes the first benchmark
look more like this one (a less awkward comparison could probably be
made between what I've done for uniform distribution read workloads,
and a read workload with far more uneven data access, which will
naturally have fewer cache misses and less TLB contention). I strongly
suspect I wouldn't have seen a significant number of cache misses when
binary searching on btree pages if I was to instrument this workload.

It might still be worth pursuing the SERIAL hinting mechanism, but
it's not going to be nearly as big a win as what I've done for reads,
I think. It might also be worth looking at a workload with uniformly
distributed btree index tuple insertions, where I'd expect similar
advantages to those originally shown for reads.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

over 12 years ago

In reply to: Robert Haas (#6)

Re: Performance optimization of btree binary search

On 2013-12-04 18:48:44 -0500, Robert Haas wrote:

* When a type narrower than Datum is stored in a Datum, we place it in the
* low-order bits and are careful that the DatumGetXXX macro for it discards
* the unused high-order bits (as opposed to, say, assuming they are zero).
* This is needed to support old-style user-defined functions, since depending
* on architecture and compiler, the return value of a function returning char
* or short may contain garbage when called as if it returned Datum.

And record_image_eq does a rather elaborate dance around here, calling
the appropriate GET_x_BYTES macro depending on the type-width. If we
can really count on the high-order bits to be zero, that's all
completely unnecessary tomfoolery.

I don't think we can get rid of that dance in record_image_eq - it very
well could used on records originally generated when those bits haven't
been guaranteed to be zeroed.
Other usecases where the appropriate DatumGetFoo() macros are used don't
have a problem with that since it's cleared again there, but in
record_image_eq we can't rely on that.

Or am I missing something?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 12 years ago

In reply to: Tom Lane (#15)

Re: Performance optimization of btree binary search

On 12/05/2013 07:30 AM, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

On Wed, 2013-12-04 at 20:27 -0500, Tom Lane wrote:

Lazy people? I'm not in a hurry to drop it; it's not costing us much to
just sit there, other than in this connection which we see how to fix.

Actually, I think it probably costs a fair portion of extension authors
when their initial code crashes because they forgot to declare all their
functions V1. I think it might actually be more of a bother to lazy
people than a benefit.

Hm. We have heard one or two complaints like that, but not a huge number.

It happens to me about 75% of the time when I write a new C function.
These days, I usually realize it pretty quickly.

I wonder how easy it would be to make the compiler produce a warning
about it. Or issue a warning in PostgreSQL when you do CREATE FUNCTION
and the C function appears to be a V0 function.

I'm worried about breaking code that's been working since god-knows-when;
but I will concede there's little evidence that there's very much of that
out there either.

I wouldn't mind just dropping the V0 support. It's not difficult to
modify your function for V1 convention, so if there's still anyone out
there using V0, it wouldn't be unreasonable to require them to update.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Andres Freund (#17)

Re: Performance optimization of btree binary search

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-12-04 18:48:44 -0500, Robert Haas wrote:

And record_image_eq does a rather elaborate dance around here, calling
the appropriate GET_x_BYTES macro depending on the type-width. If we
can really count on the high-order bits to be zero, that's all
completely unnecessary tomfoolery.

I don't think we can get rid of that dance in record_image_eq - it very
well could used on records originally generated when those bits haven't
been guaranteed to be zeroed.

No, you're failing to think about the context here. A Datum is not an
on-disk concept, only an in-memory one. In the case of record_image_eq,
simplifying the comparison of by-value Datums is unconditionally safe
as long as heap_deform_tuple is consistent about using the proper
GetDatum macros, which it is. (So actually, that "elaborate dance"
is a 100% waste of effort today, regardless of any changes we're
discussing here.)

The risk we take by simplifying comparisons in a more general context
is that some function somewhere might've been sloppy about doing a
native-type-to-Datum conversion on its result. In the case of V0
functions that risk is unavoidable except by adding some explicit cleanup
code, but I'm a bit worried that somebody, particularly third-party code,
might've sloppily written "return foo" in a V1 function when "return
Int32GetDatum(foo)" would be correct. In that case, the resultant Datum
might have not-per-spec high-order bits, and if it reaches the fast
comparator without ever having been squeezed into a physical tuple,
we've got a problem.

I would certainly regard this as a bug in that function, but nonetheless
it's a hazard that we need to set against any performance improvement
that we can buy this way.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Andres Freund

andres@anarazel.de

over 12 years ago

In reply to: Tom Lane (#19)

Re: Performance optimization of btree binary search

On 2013-12-05 08:58:55 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

I don't think we can get rid of that dance in record_image_eq - it very
well could used on records originally generated when those bits haven't
been guaranteed to be zeroed.

No, you're failing to think about the context here.

Ah yes. I was completely forgetting that
heap_form_tuple()/heap_fill_tuple() will properly take care to only use
meaningful parts of the (to-be-)stored data, not random padding.

Thanks.

The risk we take by simplifying comparisons in a more general context
is that some function somewhere might've been sloppy about doing a
native-type-to-Datum conversion on its result. In the case of V0
functions that risk is unavoidable except by adding some explicit cleanup
code, but I'm a bit worried that somebody, particularly third-party code,
might've sloppily written "return foo" in a V1 function when "return
Int32GetDatum(foo)" would be correct. In that case, the resultant Datum
might have not-per-spec high-order bits, and if it reaches the fast
comparator without ever having been squeezed into a physical tuple,
we've got a problem.

Too bad V1 hasn't insisted on using PG_RETURN_* macros. That would have
allowed asserts checking against such cases by setting
fcinfo->has_returned = true or such...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Andres Freund (#20)

#22

Andres Freund

andres@anarazel.de

over 12 years ago

In reply to: Tom Lane (#21)

#23

Tom Lane