char() overhead on read-only workloads not so insignifcant as the docs claim it is...
I'm currently doing some benchmarking on a Nehalem
box(http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html)
with 8.4 and while investigating what looks like issues in pgbench I
also noticed that using char() has more than a negligable overhead on
some (very special) readonly(!) workloads.
for example running sysbench in read-only mode against 8.4 results in a
profile(for the full run) that looks similiar to:
samples % symbol name
981690 11.0656 bcTruelen
359183 4.0487 index_getnext
311128 3.5070 AllocSetAlloc
272330 3.0697 hash_search_with_hash_value
258157 2.9099 LWLockAcquire
195673 2.2056 _bt_compare
190303 2.1451 slot_deform_tuple
168101 1.8948 PostgresMain
164191 1.8508 _bt_checkkeys
126110 1.4215 FunctionCall2
123965 1.3973 SearchCatCache
120629 1.3597 LWLockRelease
the default sysbench mode actually uses a number of different queries
and the ones dealing with char() are actually only a small part of the
full set of queries sent.
The specific query is causing bcTruelen to show up in the profile is:
"SELECT c from sbtest where id between $1 and $2 order by c" where the
parameters are for example
$1 = '5009559', $2 = '5009658' - ie ranges of 100.
benchmarking only that query results in:
samples % symbol name
2148182 23.5861 bcTruelen
369463 4.0565 index_getnext
362784 3.9832 AllocSetAlloc
284198 3.1204 slot_deform_tuple
185279 2.0343 _bt_checkkeys
180119 1.9776 LWLockAcquire
172733 1.8965 appendBinaryStringInfo
144158 1.5828 internal_putbytes
141040 1.5486 AllocSetFree
138093 1.5162 printtup
124255 1.3643 hash_search_with_hash_value
117054 1.2852 heap_form_minimal_tuple
at around 46000 queries/s
changing the fault sysbench schema from:
Table "public.sbtest"
Column | Type | Modifiers
--------+----------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character(120) | not null default ''::bpchar
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)
to
Table "public.sbtest"
Column | Type | Modifiers
--------+-------------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character varying | not null default ''::character varying
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)
results in a near 50%(!) speedup in terms of tps to around 67000
queries/s. This is however an extreme case because the c column actually
contains no data at all (except for an empty string).
the profile for the changed testcase looks like:
430797 5.2222 index_getnext
396750 4.8095 AllocSetAlloc
345508 4.1883 slot_deform_tuple
228222 2.7666 appendBinaryStringInfo
227766 2.7610 _bt_checkkeys
193818 2.3495 LWLockAcquire
179925 2.1811 internal_putbytes
168871 2.0471 printtup
152026 1.8429 AllocSetFree
146333 1.7739 heap_form_minimal_tuple
144305 1.7493 FunctionCall2
128320 1.5555 hash_search_with_hash_value
at the very least we should reconsider this part of our docs:
" There is no performance difference between these three types, apart
from increased storage space when using the blank-padded type, and a few
extra CPU cycles to check the length when storing into a
length-constrained column."
from http://www.postgresql.org/docs/8.4/static/datatype-character.html
regards
Stefan
Comments?
On Sat, Jun 13, 2009 at 3:44 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
I'm currently doing some benchmarking on a Nehalem box(
http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html)
with 8.4 and while investigating what looks like issues in pgbench I also
noticed that using char() has more than a negligable overhead on some (very
special) readonly(!) workloads.for example running sysbench in read-only mode against 8.4 results in a
profile(for the full run) that looks similiar to:samples % symbol name
981690 11.0656 bcTruelen
359183 4.0487 index_getnext
311128 3.5070 AllocSetAlloc
272330 3.0697 hash_search_with_hash_value
258157 2.9099 LWLockAcquire
195673 2.2056 _bt_compare
190303 2.1451 slot_deform_tuple
168101 1.8948 PostgresMain
164191 1.8508 _bt_checkkeys
126110 1.4215 FunctionCall2
123965 1.3973 SearchCatCache
120629 1.3597 LWLockReleasethe default sysbench mode actually uses a number of different queries and
the ones dealing with char() are actually only a small part of the full set
of queries sent.
The specific query is causing bcTruelen to show up in the profile is:"SELECT c from sbtest where id between $1 and $2 order by c" where the
parameters are for example
$1 = '5009559', $2 = '5009658' - ie ranges of 100.benchmarking only that query results in:
samples % symbol name
2148182 23.5861 bcTruelen
369463 4.0565 index_getnext
362784 3.9832 AllocSetAlloc
284198 3.1204 slot_deform_tuple
185279 2.0343 _bt_checkkeys
180119 1.9776 LWLockAcquire
172733 1.8965 appendBinaryStringInfo
144158 1.5828 internal_putbytes
141040 1.5486 AllocSetFree
138093 1.5162 printtup
124255 1.3643 hash_search_with_hash_value
117054 1.2852 heap_form_minimal_tupleat around 46000 queries/s
changing the fault sysbench schema from:
Table "public.sbtest"
Column | Type | Modifiers--------+----------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character(120) | not null default ''::bpchar
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)to
Table "public.sbtest"
Column | Type | Modifiers--------+-------------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character varying | not null default ''::character varying
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)results in a near 50%(!) speedup in terms of tps to around 67000 queries/s.
This is however an extreme case because the c column actually contains no
data at all (except for an empty string).the profile for the changed testcase looks like:
430797 5.2222 index_getnext
396750 4.8095 AllocSetAlloc
345508 4.1883 slot_deform_tuple
228222 2.7666 appendBinaryStringInfo
227766 2.7610 _bt_checkkeys
193818 2.3495 LWLockAcquire
179925 2.1811 internal_putbytes
168871 2.0471 printtup
152026 1.8429 AllocSetFree
146333 1.7739 heap_form_minimal_tuple
144305 1.7493 FunctionCall2
128320 1.5555 hash_search_with_hash_valueat the very least we should reconsider this part of our docs:
" There is no performance difference between these three types, apart from
increased storage space when using the blank-padded type, and a few extra
CPU cycles to check the length when storing into a length-constrained
column."from http://www.postgresql.org/docs/8.4/static/datatype-character.html
regards
Stefan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Lets call it Postgres
EnterpriseDB http://www.enterprisedb.com
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
Mail sent from my BlackLaptop device
On Sat, Jun 13, 2009 at 3:44 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
The specific query is causing bcTruelen to show up in the profile is:
"SELECT c from sbtest where id between $1 and $2 order by c" where the
parameters are for example
$1 = '5009559', $2 = '5009658' - ie ranges of 100.benchmarking only that query results in:
samples % symbol name
2148182 23.5861 bcTruelen
369463 4.0565 index_getnext
362784 3.9832 AllocSetAlloc
Gurjeet Singh escribi�:
Comments?
Maybe bcTruelen could be optimized to step on one word at a time
(perhaps by using XOR against a precomputed word filled with ' '),
instead of one byte at a time ...
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro,
Maybe bcTruelen could be optimized to step on one word at a time
(perhaps by using XOR against a precomputed word filled with ' '),
instead of one byte at a time ...
I have a patch for this, will send soon.
Regards,
Jeremy
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
---
src/backend/utils/adt/varchar.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 5f3c658..6889dff 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -624,16 +624,34 @@ varchartypmodout(PG_FUNCTION_ARGS)
static int
bcTruelen(BpChar *arg)
{
+ const unsigned int spaces = 0x20202020;
+ const int wordsize = sizeof(spaces);
char *s = VARDATA_ANY(arg);
int i;
- int len;
- len = VARSIZE_ANY_EXHDR(arg);
- for (i = len - 1; i >= 0; i--)
+ i = VARSIZE_ANY_EXHDR(arg) - 1;
+
+ /* compare down to an aligned boundary */
+ for (; i >= 0 && i % wordsize != wordsize - 1; i--)
{
if (s[i] != ' ')
+ return i + 1;
+ }
+
+ /* now that we're aligned, compare word at a time */
+ for (; i >= wordsize - 1; i -= wordsize)
+ {
+ if (*(unsigned int *)(s + i - (wordsize - 1)) != spaces)
break;
}
+
+ /* check within the last non-matching word */
+ for (; i >= 0; i--)
+ {
+ if (s[i] != ' ')
+ break;
+ }
+
return i + 1;
}
On Jun 15, 2009, at 9:04 PM, Jeremy Kerr <jk@ozlabs.org> wrote:
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
---
src/backend/utils/adt/varchar.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/ varchar.c index 5f3c658..6889dff 100644 --- a/src/backend/utils/adt/varchar.c +++ b/src/backend/utils/adt/varchar.c @@ -624,16 +624,34 @@ varchartypmodout(PG_FUNCTION_ARGS) static int bcTruelen(BpChar *arg) { + const unsigned int spaces = 0x20202020; + const int wordsize = sizeof(spaces);
This looks very non-portable to me.
...Robert
Robert,
This looks very non-portable to me.
Unsurprisingly, I'm new to postgres hacking and the large number of
supported platforms :)
I was considering something like:
unsigned int spaces;
const unsigned int wordsize = sizeof(unsigned int);
memset(&spaces, ' ', wordsize);
In most cases, the compiler should be able to optimise the memset out,
but it may introduce overhead where this is not possible.
However, are there any supported platforms where sizeof(unsigned int) !=
4 ?
Cheers,
Jeremy
Jeremy,
* Jeremy Kerr (jk@ozlabs.org) wrote:
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
---
src/backend/utils/adt/varchar.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
Thanks for the contribution. A couple of comments:
The documentation for submitting a patch to PostgreSQL is here:
http://wiki.postgresql.org/wiki/Submitting_a_Patch
There is also a Developer FAQ available here:
http://wiki.postgresql.org/wiki/Developer_FAQ
The PostgreSQL core folks prefer context diffs (it's not my preference,
but I'm not part of core, nor am I a committer :).
There are a number of things requested to be included with a patch, but
in particular I would point out:
----
Which CVS branch the patch is against (ordinarily this will be HEAD).
Whether it compiles and tests successfully, so we know nothing obvious
is broken.
Whether it contains any platform-specific items and if so, has it been
tested on other platforms.
Describe the effect your patch has on performance, if any. If the patch
is intended to improve performance, it's a good idea to include some
reproducible tests to demonstrate the improvement.
----
You might check out sections 3 & 6 of src/include/c.h. Section 3
defines standard system types, while section 6 defines some widely
useful macros; in particular our custom MemSet and MemSetAligned, which
work on aligned memory structures for improved performance.
Thanks,
Stephen
On Mon, Jun 15, 2009 at 9:51 PM, Jeremy Kerr<jk@ozlabs.org> wrote:
I was considering something like:
unsigned int spaces;
const unsigned int wordsize = sizeof(unsigned int);memset(&spaces, ' ', wordsize);
In most cases, the compiler should be able to optimise the memset out,
but it may introduce overhead where this is not possible.
What about just:
static char spaces[4] = { ' ', ' ', ' ', ' ' };
and then ... * (uint32 *) spaces?
There's not much point taking the length of the word when you've
initialized it to contain exactly 4 bytes. What you want to do is
pick the flavor of integer that will be the same length as what you've
initialized, and it turns out we already have that (see
src/include/c.h).
As I look at this, another problem is that it seems to me that you're
assuming that VARDATA_ANY() will return an aligned pointer, which
isn't necessarily the case (see src/include/postgres.h).
The advice in Stephen's email is also very good - in particular,
whatever you come up with, you should submit performance results.
Note that while --enable-profiling is very useful and profiling
numbers are good to submit, you'll also want to make sure you do a
build that is optimized for speed (i.e. no profiling, no casserts, no
debug) and do timings on that.
...Robert
Robert Haas wrote:
The advice in Stephen's email is also very good - in particular,
whatever you come up with, you should submit performance results.
Note that while --enable-profiling is very useful and profiling
numbers are good to submit, you'll also want to make sure you do a
build that is optimized for speed (i.e. no profiling, no casserts, no
debug) and do timings on that.
I'm particularly interested to see how much the patch hurts performance
in the cases where it's not helping.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
* Robert Haas (robertmhaas@gmail.com) wrote:
As I look at this, another problem is that it seems to me that you're
assuming that VARDATA_ANY() will return an aligned pointer, which
isn't necessarily the case (see src/include/postgres.h).
I believe you need to look at it more carefully. I don't think it's
making any such assumption. Specifically, it has three loops; an "until
we're aligned" loop, then a "while we're aligned", and a "when we've
done all the aligned we could do".
On the flip side, I am curious as to if the arguments to a stored
procedure are always aligned or not. Never had a case to care before,
but if palloc() is always going to return an aligned chunk of memory
(per MemSetAligned in c.h) it makes me wonder.
Thanks,
Stephen
On Tue, Jun 16, 2009 at 6:30 AM, Stephen Frost<sfrost@snowman.net> wrote:
* Robert Haas (robertmhaas@gmail.com) wrote:
As I look at this, another problem is that it seems to me that you're
assuming that VARDATA_ANY() will return an aligned pointer, which
isn't necessarily the case (see src/include/postgres.h).I believe you need to look at it more carefully. I don't think it's
making any such assumption. Specifically, it has three loops; an "until
we're aligned" loop, then a "while we're aligned", and a "when we've
done all the aligned we could do".
I see that... but I don't think the test in the first loop is correct.
It's based on the value of i % 4, but I'm not convinced that you know
anything about the alignment at the point where i == 0.
I might be all wet here, I haven't looked at this area of the code in detail.
On the flip side, I am curious as to if the arguments to a stored
procedure are always aligned or not. Never had a case to care before,
but if palloc() is always going to return an aligned chunk of memory
(per MemSetAligned in c.h) it makes me wonder.
Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
varlena header...
...Robert
On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas<robertmhaas@gmail.com> wrote:
I see that... but I don't think the test in the first loop is correct.
It's based on the value of i % 4, but I'm not convinced that you know
anything about the alignment at the point where i == 0.
That's correct. To check the alignment you would have to look at the
actual pointer. I would suggest using the existing macros to handle
alignment. Hm, though the only one I see offhand which is relevant is
the moderately silly PointerIsAligned(). Still it would make the code
clearer even if it's pretty simple.
Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
think, bogus. There would be no alignment guarantee on that array.
Personally I'm find with 0x20202020 with a comment explaining what it
is.
* Robert Haas (robertmhaas@gmail.com) wrote:
I see that... but I don't think the test in the first loop is correct.
It's based on the value of i % 4, but I'm not convinced that you know
anything about the alignment at the point where i == 0.
Ah, you may be half right there (see below). It does appear to be
assuming that char *s (or s[i == 0]) is aligned, which isn't a
guarentee (in fact, it might never be right..). If having it actually
aligned is an important bit (as opposed to just doing the comparisons in
larger but possibly unaligned blocks) then that'd make a difference.
If the code as-is showed performance improvment even when it's working
on less-than-aligned blocks, I'd be curious what would happen if it was
actually aligned. Unfortunately, the results of such would probably be
heavily architecture dependent..
On the flip side, I am curious as to if the arguments to a stored
procedure are always aligned or not. Never had a case to care before,
but if palloc() is always going to return an aligned chunk of memory
(per MemSetAligned in c.h) it makes me wonder.Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
varlena header...
Right, but I'm talking about the base of the argument itself, not
the start of the data. If every variable length argument to a stored
procedure is palloc()'d independently, and palloc()'s always return
aligned memory, we'd at least know that the base of the argument is
aligned and could figure out the header size and then do the
comparisons accordingly. This would also mean, of course, that we'd
almost(?) never have s[i == 0] on an aligned boundary due to the
header.
Thanks,
Stephen
On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas<robertmhaas@gmail.com> wrote:
On the flip side, I am curious as to if the arguments to a stored
procedure are always aligned or not. Never had a case to care before,
but if palloc() is always going to return an aligned chunk of memory
(per MemSetAligned in c.h) it makes me wonder.Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
varlena header...
There are two points here that kind of cancel each other out :)
If the data is in fact returned from a palloc because it was the
result of some other function call then it will almost certainly have
a 4-byte header and that'll be aligned. There are some exceptions
where functions are just returning copies and copy the whole datum
though, but the point is we normally don't toast or pack varlenas
unless they're being stored on disk.
However that's all irrelevant because there's no guarantee the data
being passed will have been palloced at all. You could get a pointer
to data in a shared buffer. Ie, data on disk. That will be aligned
based on how tuples are packed on disk which is precisely where we go
out of our way to avoid wasting space on alignment.
* Greg Stark (gsstark@mit.edu) wrote:
There are two points here that kind of cancel each other out :)
Thanks for the insight. :)
Stephen
On Tue, Jun 16, 2009 at 1:41 PM, Stephen Frost<sfrost@snowman.net> wrote:
Ah, you may be half right there (see below). It does appear to be
assuming that char *s (or s[i == 0]) is aligned, which isn't a
guarentee (in fact, it might never be right..). If having it actually
aligned is an important bit (as opposed to just doing the comparisons in
larger but possibly unaligned blocks) then that'd make a difference.
On some architectures like intel accessing unaligned ints is just
slow. On others (Alpha and PPC iirc?) it is an immediate bus error.
I would actually be more curious whether we can do th e comparison
without having to pre-scan for the spaces at the end than trying to
opimize that prescan.
On Tue, Jun 16, 2009 at 8:38 AM, Greg Stark<gsstark@mit.edu> wrote:
On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas<robertmhaas@gmail.com> wrote:
I see that... but I don't think the test in the first loop is correct.
It's based on the value of i % 4, but I'm not convinced that you know
anything about the alignment at the point where i == 0.That's correct. To check the alignment you would have to look at the
actual pointer. I would suggest using the existing macros to handle
alignment. Hm, though the only one I see offhand which is relevant is
the moderately silly PointerIsAligned(). Still it would make the code
clearer even if it's pretty simple.Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
think, bogus. There would be no alignment guarantee on that array.
Personally I'm find with 0x20202020 with a comment explaining what it
is.
Ooh, good point. I still don't like the 0x20 thing, but using uint32
instead of int or long is the main point, unless we support any
platforms where 0x20 != ' '.
...Robert
Robert Haas wrote:
Ooh, good point. I still don't like the 0x20 thing, but using uint32
instead of int or long is the main point, unless we support any
platforms where 0x20 != ' '.
All our server encodings are strictly ASCII supersets. So 0x20 is always
the space character.
cheers
andrew
Hi all,
That's correct. To check the alignment you would have to look at the
actual pointer. I would suggest using the existing macros to handle
alignment. Hm, though the only one I see offhand which is relevant is
the moderately silly PointerIsAligned(). Still it would make the code
clearer even if it's pretty simple.
Yes, the code (incorrectly) assumes that any multiple-of-4 index into
the char array is aligned, and so the array itself must be aligned for
this to work.
I'll rework the patch, testing the pointer alignment directly instead.
Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
think, bogus. There would be no alignment guarantee on that array.
Personally I'm find with 0x20202020 with a comment explaining what it
is.
The variable is called 'spaces', but I can add extra comments if
preferred.
Cheers,
Jeremy