Q about heap_getattr

Started by Tom Lanealmost 27 years ago6 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I've been doing some more backend profiling, and observe that in a large
SELECT from a table with lots of columns, nocachegetattr (the guts of
heap_getattr) is at the top of the list, accounting for about 15% of
runtime.

The percentage would be lower in a table with fewer columns or no null
columns, but it still seems worth working on. (Besides, this case right
here is a real-world case for me.)

What's drawing my eye is that printtup() is calling heap_getattr twice
for each attribute of each tuple --- once in the first scan that
prepares the null-fields bitmap, and then again to actually output the
field value. So, what I want to do is call heap_getattr only once per
attribute and save the returned value for use in the second loop.
That should halve the time spent in nocachegetattr and thus knock
7 or so percent off the runtime of SELECT.

The question for the list: how long is the Datum value returned by
heap_getattr valid? In particular, could it be invalidated by calling
heap_getattr for another field of the same tuple? If there are any
cases like that, then this optimization won't work. I don't know the
backend well enough to guess whether this is safe.

regards, tom lane

#2Vadim Mikheev
vadim@krs.ru
In reply to: Tom Lane (#1)
Re: [HACKERS] Q about heap_getattr

Tom Lane wrote:

I've been doing some more backend profiling, and observe that in a large
SELECT from a table with lots of columns, nocachegetattr (the guts of
heap_getattr) is at the top of the list, accounting for about 15% of
runtime.

The percentage would be lower in a table with fewer columns or no null
columns, but it still seems worth working on. (Besides, this case right
here is a real-world case for me.)

What's drawing my eye is that printtup() is calling heap_getattr twice
for each attribute of each tuple --- once in the first scan that
prepares the null-fields bitmap, and then again to actually output the
field value. So, what I want to do is call heap_getattr only once per
attribute and save the returned value for use in the second loop.
That should halve the time spent in nocachegetattr and thus knock
7 or so percent off the runtime of SELECT.

Try to use heap_attisnull in first scan!
This func just tests nulls bitmap array of tuple...

Vadim
P.S. Tom, I forgot to attach new allocation code in my prev letter,
but now I want to reimplement them.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vadim Mikheev (#2)
Re: [HACKERS] Q about heap_getattr

Vadim Mikheev <vadim@krs.ru> writes:

Tom Lane wrote:

So, what I want to do is call heap_getattr only once per
attribute and save the returned value for use in the second loop.

Try to use heap_attisnull in first scan!

Ah, that looks like a much better idea. Consider it done...

thanks, tom lane

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#2)
Re: [HACKERS] Q about heap_getattr

Tom Lane wrote:

I've been doing some more backend profiling, and observe that in a large
SELECT from a table with lots of columns, nocachegetattr (the guts of
heap_getattr) is at the top of the list, accounting for about 15% of
runtime.

The percentage would be lower in a table with fewer columns or no null
columns, but it still seems worth working on. (Besides, this case right
here is a real-world case for me.)

What's drawing my eye is that printtup() is calling heap_getattr twice
for each attribute of each tuple --- once in the first scan that
prepares the null-fields bitmap, and then again to actually output the
field value. So, what I want to do is call heap_getattr only once per
attribute and save the returned value for use in the second loop.
That should halve the time spent in nocachegetattr and thus knock
7 or so percent off the runtime of SELECT.

Try to use heap_attisnull in first scan!
This func just tests nulls bitmap array of tuple...

Vadim
P.S. Tom, I forgot to attach new allocation code in my prev letter,
but now I want to reimplement them.

Good idea. Hadn't thought of that. To answer Tom's question, it
doesn't matter how many times you call heap_getattr(). You can cache
the values, as long as the tuple doesn't change.

nocachegetattr() computes all offsets, even offsets after the column you
are requesting, to prevent future calls. You must have nulls or
varlena's that is causing nocachegetattr to be called so many times.
Is this true?

heap_getattr() certainly is called many times, and needs any
optimization we can give it. I have done as much as I could. Perhaps
there are more opportunities I missed.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Q about heap_getattr

Bruce Momjian <maillist@candle.pha.pa.us> writes:

Tom Lane wrote:

I've been doing some more backend profiling, and observe that in a large
SELECT from a table with lots of columns, nocachegetattr (the guts of
heap_getattr) is at the top of the list, accounting for about 15% of
runtime.

The percentage would be lower in a table with fewer columns or no null
columns, but it still seems worth working on. (Besides, this case right
here is a real-world case for me.)

nocachegetattr() computes all offsets, even offsets after the column you
are requesting, to prevent future calls. You must have nulls or
varlena's that is causing nocachegetattr to be called so many times.
Is this true?

Right, this table has 38 columns, many of which can be NULL and several
of which are variable-size. So it's probably the worst-case scenario as
far as the cost of nocachegetattr is concerned. It looked to me like
the pre-computation aspect of nocachegetattr only works for tables where
all the tuples have the same physical layout, ie, no varlenas or nulls;
is that right?

heap_getattr() certainly is called many times, and needs any
optimization we can give it. I have done as much as I could. Perhaps
there are more opportunities I missed.

I thought I had spotted a couple of possibilities for small improvements
of the code inside nocachegetattr, but it was awfully late by then so
I didn't try changing anything. I'll take another look.

regards, tom lane

#6Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#5)
Re: [HACKERS] Q about heap_getattr

Bruce Momjian <maillist@candle.pha.pa.us> writes:

Tom Lane wrote:

I've been doing some more backend profiling, and observe that in a large
SELECT from a table with lots of columns, nocachegetattr (the guts of
heap_getattr) is at the top of the list, accounting for about 15% of
runtime.

The percentage would be lower in a table with fewer columns or no null
columns, but it still seems worth working on. (Besides, this case right
here is a real-world case for me.)

nocachegetattr() computes all offsets, even offsets after the column you
are requesting, to prevent future calls. You must have nulls or
varlena's that is causing nocachegetattr to be called so many times.
Is this true?

Right, this table has 38 columns, many of which can be NULL and several
of which are variable-size. So it's probably the worst-case scenario as
far as the cost of nocachegetattr is concerned. It looked to me like
the pre-computation aspect of nocachegetattr only works for tables where
all the tuples have the same physical layout, ie, no varlenas or nulls;
is that right?

heap_getattr() certainly is called many times, and needs any
optimization we can give it. I have done as much as I could. Perhaps
there are more opportunities I missed.

I thought I had spotted a couple of possibilities for small improvements
of the code inside nocachegetattr, but it was awfully late by then so
I didn't try changing anything. I'll take another look.

Also, I see a few places where heap_getattr is called, just looking for
a null. You can use mkid(see developers faq) to find them. If you
don't modify them, I can.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026