Performance impact of NULLs and variable length fields
Sometime ago somebody asked if it made a difference adn it was suggested
that the effect was probably marginal. I ran a profiler over postgres doing
a large query and these are the top 10 functions:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
16.04 0.51 0.51 1676772 0.00 0.00 nocachegetattr
11.95 0.89 0.38 1427403 0.00 0.00 heapgettup
10.06 1.21 0.32 2955372 0.00 0.00 LockBuffer
6.92 1.43 0.22 3406475 0.00 0.00 ExecEvalExpr
5.03 1.59 0.16 1617018 0.00 0.00 AllocSetReset
4.40 1.73 0.14 1427403 0.00 0.00 heap_getnext
4.09 1.86 0.13 1844339 0.00 0.00 ExecEvalVar
4.09 1.99 0.13 1441330 0.00 0.00 ReleaseBuffer
4.09 2.12 0.13 1427064 0.00 0.00 SeqNext
3.46 2.23 0.11 1552338 0.00 0.00 ExecQual
Note: the wall clock time of the query was about 30 seconds. The total cpu
time was 3.18 seconds.
Now, nocachegetattr is called mostly whenever there are NULLs or variable
length strings in the tuple. Since our biggest table begins with a variable
length field, every single lookup is going to be uncached. Looks like the
effect is significant.
Is this ever going to change?
Just for interests sake i've also got the results for a plain select sum(x)
from y;
% cumulative self self total
time seconds seconds calls ms/call ms/call name
9.12 0.60 0.60 1423541 0.00 0.00 heapgettup
8.51 1.16 0.56 1423329 0.00 0.00 nocachegetattr
6.08 1.56 0.40 2850372 0.00 0.00 AllocSetAlloc
5.47 1.92 0.36 1423308 0.00 0.00 ExecTargetList
5.02 2.25 0.33 2846619 0.00 0.00 AllocSetReset
4.10 2.52 0.27 1423308 0.00 0.00 heap_formtuple
3.80 2.77 0.25 2846614 0.00 0.00 ExecEvalVar
3.80 3.02 0.25 2 125.00 922.33 ExecAgg
3.65 3.26 0.24 1423310 0.00 0.00 ExecProcNode
3.04 3.46 0.20 1423307 0.00 0.00 advance_transition_function
Seems to me that the generality of the aggregate implementation doesn't seem
to really be an issue.
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
Show quoted text
It would be nice if someone came up with a certification system that
actually separated those who can barely regurgitate what they crammed over
the last few weeks from those who command secret ninja networking powers.
Martijn van Oosterhout <kleptog@svana.org> writes:
Sometime ago somebody asked if it made a difference adn it was suggested
that the effect was probably marginal. I ran a profiler over postgres doing
a large query and these are the top 10 functions:
This is pretty meaningless when you didn't give one single detail about
the test query or conditions.
regards, tom lane
Well, Tom pointed out that without the query itself this was fairly
meaningless. So here it is:
select c.id, sum(d.cost)
from c, l, d
where c.id = l.cid
and l.clid = d.clid
and c.accman = '2500'
and billid is null
group by c.id;
EXPLAIN output:
Aggregate (cost=51479.55..51507.07 rows=550 width=48) (actual time=26192.07..26222.07 rows=156 loops=1)
-> Group (cost=51479.55..51493.31 rows=5503 width=48) (actual time=26165.81..26186.14 rows=2029 loops=156)
-> Sort (cost=51479.55..51479.55 rows=5503 width=48) (actual time=26164.90..26167.77 rows=2029 loops=1)
-> Hash Join (cost=676.79..51137.62 rows=5503 width=48) (actual time=3703.82..26122.23 rows=2029 loops=1)
-> Seq Scan on d (cost=0.00..50150.07 rows=32262 width=12) (actual time=67.35..24921.42 rows=41045 loops=1)
-> Hash (cost=671.31..671.31 rows=2193 width=36) (actual time=594.55..594.55 rows=0 loops=1)
-> Merge Join (cost=260.50..671.31 rows=2193 width=36) (actual time=251.95..551.40 rows=1330 loops=1)
-> Index Scan using l_cid on l (cost=0.00..364.36 rows=6308 width=20) (actual time=33.95..340.52 rows=6307 loops=1)
-> Sort (cost=260.50..260.50 rows=1305 width=16) (actual time=142.85..145.34 rows=1337 loops=1)
-> Seq Scan on c (cost=0.00..192.94 rows=1305 width=16) (actual time=14.99..103.40 rows=1314 loops=1)
d has over 1.4 million rows, l about 10,000 and c about 2,000.
NULL columns and variable length fields used extensivly.
On Sat, Jul 21, 2001 at 05:25:38PM +1000, Martijn van Oosterhout wrote:
Sometime ago somebody asked if it made a difference adn it was suggested
that the effect was probably marginal. I ran a profiler over postgres doing
a large query and these are the top 10 functions:% cumulative self self total
time seconds seconds calls ms/call ms/call name
16.04 0.51 0.51 1676772 0.00 0.00 nocachegetattr
11.95 0.89 0.38 1427403 0.00 0.00 heapgettup
10.06 1.21 0.32 2955372 0.00 0.00 LockBuffer
6.92 1.43 0.22 3406475 0.00 0.00 ExecEvalExpr
5.03 1.59 0.16 1617018 0.00 0.00 AllocSetReset
4.40 1.73 0.14 1427403 0.00 0.00 heap_getnext
4.09 1.86 0.13 1844339 0.00 0.00 ExecEvalVar
4.09 1.99 0.13 1441330 0.00 0.00 ReleaseBuffer
4.09 2.12 0.13 1427064 0.00 0.00 SeqNext
3.46 2.23 0.11 1552338 0.00 0.00 ExecQual
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
Show quoted text
It would be nice if someone came up with a certification system that
actually separated those who can barely regurgitate what they crammed over
the last few weeks from those who command secret ninja networking powers.
Martijn van Oosterhout wrote:
Sometime ago somebody asked if it made a difference adn it was suggested
that the effect was probably marginal. I ran a profiler over postgres doing
a large query and these are the top 10 functions:% cumulative self self total
time seconds seconds calls ms/call ms/call name
16.04 0.51 0.51 1676772 0.00 0.00 nocachegetattr
11.95 0.89 0.38 1427403 0.00 0.00 heapgettup
10.06 1.21 0.32 2955372 0.00 0.00 LockBuffer
6.92 1.43 0.22 3406475 0.00 0.00 ExecEvalExpr
5.03 1.59 0.16 1617018 0.00 0.00 AllocSetReset
4.40 1.73 0.14 1427403 0.00 0.00 heap_getnext
4.09 1.86 0.13 1844339 0.00 0.00 ExecEvalVar
4.09 1.99 0.13 1441330 0.00 0.00 ReleaseBuffer
4.09 2.12 0.13 1427064 0.00 0.00 SeqNext
3.46 2.23 0.11 1552338 0.00 0.00 ExecQual
I have added this to the TODO list:
* Improve caching of attribute offsets when NULLs exist in the row
I have some ideas about how offsets with NULL's could be handled in this
case.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026