Tightly packing expressions

Started by Douglas Dooleover 8 years ago2 messageshackers
Jump to latest
#1Douglas Doole
dougdoole@gmail.com

Andres,
Back when you were getting ready to commit your faster expressions, I
volunteered to look at switching from an array of fixed sized steps to
tightly packed structures. (
/messages/by-id/20170314221648.jrcgh5n7ld4ej2o7@alap3.anarazel.de).
I've finally found time to make it happen.

Using tightly packed structures makes the expressions a lot smaller. I
instrumented some before and after builds and ran "make check" just to see
how much memory expressions were using. What I found was:

There were ~104K expressions.

Memory - bytes needed to hold the steps of the expressions
Allocated Memory - memory is allocated in blocks, this is the bytes
allocated to hold the expressions
Wasted Memory - the difference between the allocated and used memory

Original code:
Memory: min=64 max=9984 total=28232512 average=265
Allocated Memory: min=1024 max=16384 total=111171584 average=1045
Wasted Memory: min=0 max=6400 total=82939072 average=780

New code:
Memory: min=32 (50%) max=8128 (82%) total=18266712 (65%) average=175 (66%)
Allocated Memory: min=192 (19%) max=9408 (57%) total=29386176 (26%)
average=282 (27%)
Wasted Memory: min=0 (0%) max=1280 (20%) total=11119464 (13%) average=106
(14%)

So on average expressions are about a third smaller than with the fixed
size array. The new approach doesn't overallocate as badly as the array
approach so the packed approach cuts memory consumption by over 70%. (Of
course, the array code could be tuned to reduce the overhead as well.)

Another benefit of the way I did this is that the expression memory is
never reallocated. This means that it is safe for one step to point
directly to a field in another step without needing to separately palloc
storage. That should allow us to simplify the code and reduce pointer
traversals. (I haven't exploited this in the patch. I figured it would be a
task for round 2 assuming you like what I've done here.)

The work was mostly mechanical. The only tricky bit was dealing with the
places where you jump to step n+1 while building step n. Since we can't
tell the address of step n+1 until it is actually built, I had to defer
resolution of the jumps. So all the interesting bits of this patch are in
ExprEvalPushStep().

Let me know what you think.

- Doug
Salesforce

Attachments:

exprBuild.patchtext/x-patch; charset=US-ASCII; name=exprBuild.patchDownload+3288-2645
#2Andres Freund
andres@anarazel.de
In reply to: Douglas Doole (#1)
Re: Tightly packing expressions

Hi Doug,

On 2017-08-03 18:01:06 +0000, Douglas Doole wrote:

Back when you were getting ready to commit your faster expressions, I
volunteered to look at switching from an array of fixed sized steps to
tightly packed structures. (
/messages/by-id/20170314221648.jrcgh5n7ld4ej2o7@alap3.anarazel.de).
I've finally found time to make it happen.

Thanks for working on it!

Using tightly packed structures makes the expressions a lot smaller. I
instrumented some before and after builds and ran "make check" just to see
how much memory expressions were using. What I found was:

There were ~104K expressions.

Memory - bytes needed to hold the steps of the expressions
Allocated Memory - memory is allocated in blocks, this is the bytes
allocated to hold the expressions
Wasted Memory - the difference between the allocated and used memory

Original code:
Memory: min=64 max=9984 total=28232512 average=265
Allocated Memory: min=1024 max=16384 total=111171584 average=1045
Wasted Memory: min=0 max=6400 total=82939072 average=780

New code:
Memory: min=32 (50%) max=8128 (82%) total=18266712 (65%) average=175 (66%)
Allocated Memory: min=192 (19%) max=9408 (57%) total=29386176 (26%)
average=282 (27%)
Wasted Memory: min=0 (0%) max=1280 (20%) total=11119464 (13%) average=106
(14%)

That's actually not *that* big of a saving...

Another benefit of the way I did this is that the expression memory is
never reallocated. This means that it is safe for one step to point
directly to a field in another step without needing to separately palloc
storage. That should allow us to simplify the code and reduce pointer
traversals. (I haven't exploited this in the patch. I figured it would be a
task for round 2 assuming you like what I've done here.)

Yes, that's a neat benefit. Although I think it'd even better if we
could work to the point where the mutable and the unchanging data is
separately allocated, so we can at some point avoid redundant expression
"compilations".

The work was mostly mechanical. The only tricky bit was dealing with the
places where you jump to step n+1 while building step n. Since we can't
tell the address of step n+1 until it is actually built, I had to defer
resolution of the jumps. So all the interesting bits of this patch are in
ExprEvalPushStep().

I was wondering about a more general "symbol resolution" stage
anyway. Then we'd allocate individual steps during ExecInitExprRec, and
allocate the linear array after we know the exact size.

I think it'd be important to see some performance measurements,
especially for larger queries. It'd not be too surprising if the
different method of dereffing the next expression actually has a
negative performance effect, but I'm absolutely not sure of that.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers