Multiple sorts in a query
Just wanted to check some thoughts about how memory allocation works in
complex queries. Been thinking some more about recent Solaris testing
results that *seemed* to show issues with multiple concurrent queries
that have multiple sorts.
If we have a query that uses multiple sorts, we may have a top-level
sort, with child nodes that contain sorts also. In some cases we may
find with sub-nodes that have both inner and outer sub-trees that
contain sorts also.
If we allocate large chunks of memory we use malloc(). So complex
queries can have multiple mallocs, followed by multiple reallocs. That
in itself seems likely to end up with roughly double memory use, since
realloc won't work properly/quickly with multiple mallocs. (Double since
we allocate X bytes, then 2X bytes etc until we hit the limit.)
When we later free() the memory, do we always free() it in the reverse
order in which it was allocated? If not, how does that effect reducing
the sbrk point, or other aspects of reusing allocated memory?
Is it possible that Solaris's default malloc isn't appropriate for
repeated use in complex queries that use multiple sorts?
http://developers.sun.com/solaris/articles/multiproc/multiproc.html
and recent OpenSolaris bug reports.
Anyway, feel free to jump in.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, May 19, 2009 at 12:32:13PM +0100, Simon Riggs wrote:
If we allocate large chunks of memory we use malloc(). So complex
queries can have multiple mallocs, followed by multiple reallocs. That
in itself seems likely to end up with roughly double memory use, since
realloc won't work properly/quickly with multiple mallocs. (Double since
we allocate X bytes, then 2X bytes etc until we hit the limit.)
I don't know about Solaris, but glibc has a threshold above which it
starts using mmap() instead of sbrk(). Thus, once you start using very
large blocks, freeing always returns the memory to the kernel,
irrespective of other allocations.
The threshold is dynamic apparently, but starts at 128KB.
Just a thought,
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
On Tue, May 19, 2009 at 12:32 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
If we have a query that uses multiple sorts, we may have a top-level
sort, with child nodes that contain sorts also. In some cases we may
find with sub-nodes that have both inner and outer sub-trees that
contain sorts also.
Well a top-level sort and a child sort wouldn't both be accumulating
rows at the same time. The child could still be alive behaving like a
tuplestore though.
If we allocate large chunks of memory we use malloc(). So complex
queries can have multiple mallocs, followed by multiple reallocs. That
in itself seems likely to end up with roughly double memory use, since
realloc won't work properly/quickly with multiple mallocs. (Double since
we allocate X bytes, then 2X bytes etc until we hit the limit.)
I think it's even worse than that since the old and new allocation
have to briefly coexist. So at least transiently we use 3x the size of
the actual array.
When we later free() the memory, do we always free() it in the reverse
order in which it was allocated? If not, how does that effect reducing
the sbrk point, or other aspects of reusing allocated memory?Is it possible that Solaris's default malloc isn't appropriate for
repeated use in complex queries that use multiple sorts?
Well anything's possible. Do you have any specific ideas? I would
expect any decent malloc library to shrink sbrk based on statically
analyzing where its allocations actually are, so I wouldn't expect the
pattern of frees to matter on that front. It might still fragment
memory if we allocate a bunch of large tuplestore/tuplesorts and then
allocate one object in a longer lived memory context.
What problems have you seen?
--
greg
On Tue, May 19, 2009 at 7:44 AM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
On Tue, May 19, 2009 at 12:32:13PM +0100, Simon Riggs wrote:
If we allocate large chunks of memory we use malloc(). So complex
queries can have multiple mallocs, followed by multiple reallocs. That
in itself seems likely to end up with roughly double memory use, since
realloc won't work properly/quickly with multiple mallocs. (Double since
we allocate X bytes, then 2X bytes etc until we hit the limit.)I don't know about Solaris, but glibc has a threshold above which it
starts using mmap() instead of sbrk(). Thus, once you start using very
large blocks, freeing always returns the memory to the kernel,
irrespective of other allocations.The threshold is dynamic apparently, but starts at 128KB.
I just read an article that suggests assuming that can be dangerous
(by one of the authors of jemalloc)...an interesting read.
"Update in 2006:
The above was written in 2001. Since then the world has changed a lot.
Memory got bigger. Applications got bigger. The virtual address space
layout in 32 bit linux changed.
In the new situation, brk() and mmap space is shared and there are no
artificial limits on brk size imposed by the kernel. What is more,
applications have started using transient allocations larger than the
128Kb as was imagined in 2001."
http://www.canonware.com/~ttt/2009/05/mr-malloc-gets-schooled.html
merlin
On Tue, 2009-05-19 at 09:17 -0400, Merlin Moncure wrote:
On Tue, May 19, 2009 at 7:44 AM, Martijn van Oosterhout
The threshold is dynamic apparently, but starts at 128KB.
I just read an article that suggests assuming that can be dangerous
(by one of the authors of jemalloc)...an interesting read.
http://www.canonware.com/~ttt/2009/05/mr-malloc-gets-schooled.html
Thanks both, interesting read. Hmmm...
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Simon Riggs
Sent: Tuesday, May 19, 2009 4:32 AM
To: pgsql-hackers
Subject: [HACKERS] Multiple sorts in a queryJust wanted to check some thoughts about how memory allocation works in
complex queries. Been thinking some more about recent Solaris testing
results that *seemed* to show issues with multiple concurrent queries
that have multiple sorts.If we have a query that uses multiple sorts, we may have a top-level
sort, with child nodes that contain sorts also. In some cases we may
find with sub-nodes that have both inner and outer sub-trees that
contain sorts also.If we allocate large chunks of memory we use malloc(). So complex
queries can have multiple mallocs, followed by multiple reallocs. That
in itself seems likely to end up with roughly double memory use, since
realloc won't work properly/quickly with multiple mallocs. (Double
since
we allocate X bytes, then 2X bytes etc until we hit the limit.)When we later free() the memory, do we always free() it in the reverse
order in which it was allocated? If not, how does that effect reducing
the sbrk point, or other aspects of reusing allocated memory?Is it possible that Solaris's default malloc isn't appropriate for
repeated use in complex queries that use multiple sorts?
http://developers.sun.com/solaris/articles/multiproc/multiproc.html
and recent OpenSolaris bug reports.
Solaris default malloc always uses sbrk(), and never ever tried to reduce the sbrk point.
If you want a malloc that uses mmap, there is an non-default malloc that does that (libumem or something?)
On Tue, 2009-05-19 at 09:33 -0700, Chuck McDevitt wrote:
Is it possible that Solaris's default malloc isn't appropriate for
repeated use in complex queries that use multiple sorts?
http://developers.sun.com/solaris/articles/multiproc/multiproc.html
and recent OpenSolaris bug reports.Solaris default malloc always uses sbrk(), and never ever tried to
reduce the sbrk point.If you want a malloc that uses mmap, there is an non-default malloc
that does that (libumem or something?)
OK, thanks Chuck. Doesn't sound good.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, 2009-05-19 at 13:52 +0100, Greg Stark wrote:
So at least transiently we use 3x the size of the actual array.
I was conjecturing, prior to investigation. Are you saying you know
this/have seen this already?
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Chuck McDevitt píše v út 19. 05. 2009 v 09:33 -0700:
Solaris default malloc always uses sbrk(), and never ever tried to reduce the sbrk point.
If you want a malloc that uses mmap, there is an non-default malloc that does that (libumem or something?)
There are severals memory allocator on Solaris. You can choose what you
need. See
mapalloc (it uses mmap insted of srbk)
mtmalloc (optimized fro multi threaded apps)
mumem_alloc
watchmalloc
bsdmalloc
maybe more.
What I heart is that standard malloc is not good, but it is still here
for compatibility reason with old application which depends on some
functionality.
Zdenek
Well I'm just saying if you realloc a x kilobyte block into a 2x block
and the allocator can't expand it and has to copy then it seems
inevitable.
--
Greg
On 19 May 2009, at 14:11, Simon Riggs <simon@2ndQuadrant.com> wrote:
Show quoted text
On Tue, 2009-05-19 at 13:52 +0100, Greg Stark wrote:
So at least transiently we use 3x the size of the actual array.
I was conjecturing, prior to investigation. Are you saying you know
this/have seen this already?--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, 2009-05-19 at 22:19 +0200, Zdenek Kotala wrote:
Chuck McDevitt píše v út 19. 05. 2009 v 09:33 -0700:
Solaris default malloc always uses sbrk(), and never ever tried to reduce the sbrk point.
If you want a malloc that uses mmap, there is an non-default malloc that does that (libumem or something?)
There are severals memory allocator on Solaris. You can choose what you
need. Seemapalloc (it uses mmap insted of srbk)
mtmalloc (optimized fro multi threaded apps)
mumem_alloc
watchmalloc
bsdmallocmaybe more.
What I heart is that standard malloc is not good, but it is still here
for compatibility reason with old application which depends on some
functionality.
Which one is used in the default PostgreSQL build for Solaris? If you
use default malloc, have you tested the others and would you recommend
one in particular?
Which one has Dimitri used in his performance testing?
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, 2009-05-19 at 16:49 -0400, Greg Stark wrote:
Well I'm just saying if you realloc a x kilobyte block into a 2x block
and the allocator can't expand it and has to copy then it seems
inevitable.
OK, understood.
So there is grounds at least for an investigation into how that works
and whether it is as inefficient as we think it might be.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On 05/20/2009 10:14 AM, Simon Riggs wrote:
On Tue, 2009-05-19 at 22:19 +0200, Zdenek Kotala wrote:
Chuck McDevitt píše v út 19. 05. 2009 v 09:33 -0700:
What I heart is that standard malloc is not good, but it is still here
for compatibility reason with old application which depends on some
functionality.Which one is used in the default PostgreSQL build for Solaris? If you
use default malloc, have you tested the others and would you recommend
one in particular?
You don't even need to recompile it most of the time (unless statically
compiled or similar things). LD_PRELOAD'ing another malloc library
should normally be enough.
Andres
Simon Riggs píše v st 20. 05. 2009 v 09:14 +0100:
What I heart is that standard malloc is not good, but it is still here
for compatibility reason with old application which depends on some
functionality.Which one is used in the default PostgreSQL build for Solaris? If you
use default malloc, have you tested the others and would you recommend
one in particular?
We use default one. I did not tested difference between them, but IIRC
that Jignesh did some testing with umem. I will ask him. However if you
give me test scenario I can test it.
Zdenek
On Wed, 2009-05-20 at 23:01 -0400, Zdenek Kotala wrote:
We use default one. I did not tested difference between them, but IIRC
that Jignesh did some testing with umem. I will ask him. However if you
give me test scenario I can test it.
Talk with Dimitri from Sun who is doing scalability benchmarks on
pgsql-perform list now.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support