counting pallocs

Started by Robert Haasalmost 14 years ago4 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

The attached patch provides some rough instrumentation for determining
where palloc calls are coming from. This is obviously just for
noodling around with, not for commit, and there may well be bugs. But
enjoy.

I gave this a quick spin on a couple of test workloads: a very short
pgbench test, a very short pgbench -S test, and the regression tests.
On the pgbench test, the top culprits are ExecInitExpr() and
expression_tree_mutator(); in both cases, the lappend() call for the
T_List case is the major contributor. Other significant contributors
include _copyVar(), which I haven't drilled into terribly far but
seems to be coming mostly from add_vars_to_targetlist();
buildRelationAliases() via lappend, pstrdup, and makeString;
ExecAllocTupleTableSlot(); and makeColumnRef() via makeNode, lcons,
and makeString.

The pgbench -S results are similar, but build_physical_tlist() also
pops up fairly high.

On the regression tests, heap_tuple_untoast_attr() is at the very top
of the list, and specifically for the VARATT_IS_SHORT() case. It
might be good to disaggregate this some more, but I'm too tired for
that right now. index_form_tuple()'s palloc0 call comes in second,
and heap_form_minimal_tuple()'s palloc0 is third.
LockAcquireExtended()'s allocation of a new LOCALLOCK entry also comes
in prettyhigh; ExecInitExpr() shows up here too; and heap_form_tuple()
shows up as well.

One piece of reasonably low-hanging fruit appears to be OpExpr. It
seems like it would be better all around to put Node *arg1 and Node
*arg2 in there instead of a list... aside from saving pallocs, it
seems like it would generally simplify the code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

count-pallocs.patchapplication/octet-stream; name=count-pallocs.patchDownload+548-267
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#1)
Re: counting pallocs

On 17.05.2012 06:43, Robert Haas wrote:

The attached patch provides some rough instrumentation for determining
where palloc calls are coming from. This is obviously just for
noodling around with, not for commit, and there may well be bugs. But
enjoy.

I gave this a quick spin on a couple of test workloads: a very short
pgbench test, a very short pgbench -S test, and the regression tests.
On the pgbench test, the top culprits are ExecInitExpr() and
expression_tree_mutator(); in both cases, the lappend() call for the
T_List case is the major contributor. Other significant contributors
include _copyVar(), which I haven't drilled into terribly far but
seems to be coming mostly from add_vars_to_targetlist();
buildRelationAliases() via lappend, pstrdup, and makeString;
ExecAllocTupleTableSlot(); and makeColumnRef() via makeNode, lcons,
and makeString.

What percentage of total CPU usage is the palloc() overhead in these
tests? If we could totally eliminate the palloc() overhead, how much
faster would the test run?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: counting pallocs

On Thu, May 17, 2012 at 2:28 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

What percentage of total CPU usage is the palloc() overhead in these tests?
If we could totally eliminate the palloc() overhead, how much faster would
the test run?

AllocSetAlloc is often the top CPU consumer in profiling results, but
it's typically only in the single-digit percentages. However, there's
also some distributed overhead that's more difficult to measure. For
example, the fact that OpExpr uses a List instead of directly pointing
to its arguments costs us three pallocs - plus three more if we ever
copy it - but it also means that accessing the first element of an
OpExpr requires three pointer dereferences instead of one, and
accessing the second one requires four pointer dereferences instead of
one. There's no real way to isolate the overhead of that, but it's
got to cost at least something.

The reality - I'm not sure whether it's a happy reality or a sad
reality - is that most CPU profiles of PostgreSQL are pretty flat.
The nails that stick up have, for the most part, long since been
pounded down. If we want to make further improvements to our parse
and plan time, and I do, because I think we lag our competitors, then
I think this is the kind of stuff we need to look at.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: counting pallocs

Robert Haas <robertmhaas@gmail.com> writes:

One piece of reasonably low-hanging fruit appears to be OpExpr. It
seems like it would be better all around to put Node *arg1 and Node
*arg2 in there instead of a list... aside from saving pallocs, it
seems like it would generally simplify the code.

Obviously, Stephen Frost's list-allocation patch would affect your
results here ... but I wonder how much the above change would affect
*his* results. Specifically, the observation that most lists are 1
or 2 elements long would presumably become less true, but I wonder
by how much exactly.

regards, tom lane