changing sort_mem on the fly?

Started by Lonni J Friedmanabout 21 years ago20 messagesgeneral
Jump to latest
#1Lonni J Friedman
netllama@gmail.com

Is there any way to increase sort_mem without having to perform a full
restart of the DB? Will just doing a 'kill -HUP $PID" work as long as
i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

thanks!

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Lonni J Friedman (#1)
Re: changing sort_mem on the fly?

On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:

Is there any way to increase sort_mem without having to perform a full
restart of the DB? Will just doing a 'kill -HUP $PID" work as long as
i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

If you just want to change it for one connection, you can do 'set
sort_mem=newvalue;'. You might need to be an admin to do it, though.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#3Michael Fuhr
mike@fuhr.org
In reply to: Jim Nasby (#2)
Re: changing sort_mem on the fly?

On Thu, Jan 27, 2005 at 06:23:21PM -0600, Jim C. Nasby wrote:

If you just want to change it for one connection, you can do 'set
sort_mem=newvalue;'. You might need to be an admin to do it, though.

Ordinary users can set sort_mem/work_mem, at least in 8.0.0 and
earlier. Does setting that variable per session actually have an
effect, and if it does, does allowing ordinary users to set it
present a risk?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#4Jonel Rienton
jonel@road14.com
In reply to: Lonni J Friedman (#1)
Re: changing sort_mem on the fly?

I usually use pg_ctl reload instead of kill

On Thu, 27 Jan 2005 14:57:22 -0800, Lonni J Friedman wrote

Is there any way to increase sort_mem without having to perform a
full restart of the DB? Will just doing a 'kill -HUP $PID" work as
long as i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

thanks!

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Jonel Rienton
http://blogs.road14.com
Software Developer, *nix Advocate

#5Lonni J Friedman
netllama@gmail.com
In reply to: Jonel Rienton (#4)
Re: changing sort_mem on the fly?

OK, but does that actually work to change sort_mem for all future connections?

On Thu, 27 Jan 2005 19:25:51 -0600, Jonel Rienton <jonel@road14.com> wrote:

I usually use pg_ctl reload instead of kill

On Thu, 27 Jan 2005 14:57:22 -0800, Lonni J Friedman wrote

Is there any way to increase sort_mem without having to perform a
full restart of the DB? Will just doing a 'kill -HUP $PID" work as
long as i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org

#6Lonni J Friedman
netllama@gmail.com
In reply to: Jim Nasby (#2)
Re: changing sort_mem on the fly?

On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote:

On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:

Is there any way to increase sort_mem without having to perform a full
restart of the DB? Will just doing a 'kill -HUP $PID" work as long as
i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

If you just want to change it for one connection, you can do 'set
sort_mem=newvalue;'. You might need to be an admin to do it, though.

I need to do it for all future connections, not just one.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Michael Fuhr (#3)
Re: changing sort_mem on the fly?

On Thu, Jan 27, 2005 at 06:23:05PM -0700, Michael Fuhr wrote:

On Thu, Jan 27, 2005 at 06:23:21PM -0600, Jim C. Nasby wrote:

If you just want to change it for one connection, you can do 'set
sort_mem=newvalue;'. You might need to be an admin to do it, though.

Ordinary users can set sort_mem/work_mem, at least in 8.0.0 and
earlier. Does setting that variable per session actually have an
effect, and if it does, does allowing ordinary users to set it
present a risk?

I ass-u-me that the setting actually takes effect. You could test it by
setting it high enough to send a system into swapping and run a query
that will use enough sort_mem. If it starts swapping, it worked. :)

Yes, there is a risk allowing ad-hoc settings; you can starve the
machine for memory.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Lonni J Friedman (#6)
Re: changing sort_mem on the fly?

On Thu, Jan 27, 2005 at 05:52:41PM -0800, Lonni J Friedman wrote:

On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote:

On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:

Is there any way to increase sort_mem without having to perform a full
restart of the DB? Will just doing a 'kill -HUP $PID" work as long as
i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

If you just want to change it for one connection, you can do 'set
sort_mem=newvalue;'. You might need to be an admin to do it, though.

I need to do it for all future connections, not just one.

'm guessing pg_ctl reload would take effect on all new connections. Try
it, and see what psql -c 'show sort_mem' says.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#9Lonni J Friedman
netllama@gmail.com
In reply to: Jim Nasby (#8)
Re: changing sort_mem on the fly?

On Thu, 27 Jan 2005 20:05:19 -0600, Jim C. Nasby <decibel@decibel.org> wrote:

On Thu, Jan 27, 2005 at 05:52:41PM -0800, Lonni J Friedman wrote:

On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote:

On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:

Is there any way to increase sort_mem without having to perform a full
restart of the DB? Will just doing a 'kill -HUP $PID" work as long as
i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

If you just want to change it for one connection, you can do 'set
sort_mem=newvalue;'. You might need to be an admin to do it, though.

I need to do it for all future connections, not just one.

'm guessing pg_ctl reload would take effect on all new connections. Try
it, and see what psql -c 'show sort_mem' says.

excellent, that worked! thanks!

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org

#10Neil Conway
neilc@samurai.com
In reply to: Jim Nasby (#7)
Re: changing sort_mem on the fly?

On Thu, 2005-01-27 at 20:03 -0600, Jim C. Nasby wrote:

Yes, there is a risk allowing ad-hoc settings; you can starve the
machine for memory.

A malicious user who can execute SELECT queries can already consume an
arbitrary amount of memory -- say, by disabling GEQO and self-joining
pg_class to itself 50 times. I'm not sure that letting users modify
sort_mem/work_mem actually increases the risk from malicious users.
Restricting this parameter to superusers only would also be a hit to
usability.

-Neil

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#10)
Re: changing sort_mem on the fly?

Neil Conway <neilc@samurai.com> writes:

A malicious user who can execute SELECT queries can already consume an
arbitrary amount of memory -- say, by disabling GEQO and self-joining
pg_class to itself 50 times. I'm not sure that letting users modify
sort_mem/work_mem actually increases the risk from malicious users.

The correct place for a sysadmin to limit memory usage would be in the
ulimit settings the postmaster starts under. Of course, Neil's argument
still holds in general: anyone who can write arbitrary queries is not
going to have any difficulty in soaking up unreasonable amounts of
resources. Trying to restrict that would probably make the system less
useful rather than more so.

regards, tom lane

#12Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#11)
Re: changing sort_mem on the fly?

On Fri, Jan 28, 2005 at 02:43:13AM -0500, Tom Lane wrote:

Neil Conway <neilc@samurai.com> writes:

A malicious user who can execute SELECT queries can already consume an
arbitrary amount of memory -- say, by disabling GEQO and self-joining
pg_class to itself 50 times. I'm not sure that letting users modify
sort_mem/work_mem actually increases the risk from malicious users.

The correct place for a sysadmin to limit memory usage would be in the
ulimit settings the postmaster starts under. Of course, Neil's argument
still holds in general: anyone who can write arbitrary queries is not
going to have any difficulty in soaking up unreasonable amounts of
resources. Trying to restrict that would probably make the system less
useful rather than more so.

Just to clarify, I wasn't suggesting limiting sort_mem to only admins, I
was only pointing out that it's possible for a user to starve the box of
memory with it. I'd also point out that IMHO it's far more likely that
someone could accidentally hose the box with sort_mem than with a
SELECT. In the year 2020 when there's a bunch of bored developers
sitting around wondering what to do, it might be nice to have the
ability to apply some security to GUC settings, possibly just being able
to restrict them to super-user, and having a SECURITY DEFINER function
users can call instead that could apply permissions.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#13Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#11)
Re: changing sort_mem on the fly?

Tom Lane wrote:

The correct place for a sysadmin to limit memory usage would be in the
ulimit settings the postmaster starts under. Of course, Neil's argument
still holds in general: anyone who can write arbitrary queries is not
going to have any difficulty in soaking up unreasonable amounts of
resources. Trying to restrict that would probably make the system less
useful rather than more so.

I'm not sure if I agree that there's no potential for implementing
better resource limits/quotas in PG in the future, I was just pointing
out that it would require a lot more work to prevent resource
consumption by malicious users than merely limiting who is allowed to
set sort_mem/work_mem. If you could implement per-user/per-connection
limits on things like processor usage or disk space consumption, I think
that would be useful to some users (e.g. people offering PG in a web
hosting environment).

-Neil

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Neil Conway (#13)
Re: changing sort_mem on the fly?

On Sat, Jan 29, 2005 at 09:23:06PM +1100, Neil Conway wrote:

Tom Lane wrote:

The correct place for a sysadmin to limit memory usage would be in the
ulimit settings the postmaster starts under. Of course, Neil's argument
still holds in general: anyone who can write arbitrary queries is not
going to have any difficulty in soaking up unreasonable amounts of
resources. Trying to restrict that would probably make the system less
useful rather than more so.

I'm not sure if I agree that there's no potential for implementing
better resource limits/quotas in PG in the future, I was just pointing
out that it would require a lot more work to prevent resource
consumption by malicious users than merely limiting who is allowed to
set sort_mem/work_mem. If you could implement per-user/per-connection
limits on things like processor usage or disk space consumption, I think
that would be useful to some users (e.g. people offering PG in a web
hosting environment).

Since you brought up the future... :)

I'd really like to see an improvement in how sort_mem/work_mem is
handled. It's currently impossible to set it in a way to support
moderately large sorts (say, 5% of available memory) without either
embedding 'set sort_mem = blah' in your code or running the risk that at
some point your database server will start swapping. I wish there was
some way to limit total sort size for the entire system, or at least per
connection/query.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#15Neil Conway
neilc@samurai.com
In reply to: Jim Nasby (#14)
Re: changing sort_mem on the fly?

Jim C. Nasby wrote:

I'd really like to see an improvement in how sort_mem/work_mem is
handled.

So would I :) (I think it's well known that the current system is not
optimal.)

Do you have any thoughts on how to improve it?

-Neil

#16Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Neil Conway (#15)
Re: changing sort_mem on the fly?

On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote:

Jim C. Nasby wrote:

I'd really like to see an improvement in how sort_mem/work_mem is
handled.

So would I :) (I think it's well known that the current system is not
optimal.)

Do you have any thoughts on how to improve it?

See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php
and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php

There could certainly be better ways than what I proposed, too. It might
be worth investigating how the 'big 3' handle it, though some of what
they do might be patented.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#16)
Re: changing sort_mem on the fly?

"Jim C. Nasby" <decibel@decibel.org> writes:

On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote:

Do you have any thoughts on how to improve it?

See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php
and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php

All of this falls down on twin problems: (1) we have no portable way to
find out how much RAM is really available, and (2) the planner has to
make a cost estimate for the sort in advance of actual runtime --- which
requires making some assumption about the value of sort_mem that will be
used.

regards, tom lane

#18Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#17)
Re: changing sort_mem on the fly?

On Sun, Jan 30, 2005 at 01:05:15PM -0500, Tom Lane wrote:

"Jim C. Nasby" <decibel@decibel.org> writes:

On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote:

Do you have any thoughts on how to improve it?

See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php
and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php

All of this falls down on twin problems: (1) we have no portable way to
find out how much RAM is really available, and (2) the planner has to
make a cost estimate for the sort in advance of actual runtime --- which
requires making some assumption about the value of sort_mem that will be
used.

As I said, I'm sure there's better minds who can come up with better
ideas than I. :) I personally think this is important enough to warrant
a TODO so those minds can think about it at some point, but that's just
my opinion.

As for your two points: if you mean how much RAM is available for
sorting, I assumed there would be a GUC for that, although
effective_cache_size might be used.

I thought that the planner did estimate sort size when it called the
sort code, but it's been a long time since I looked at it and it was
somewhat over my head anyway. Since the planner knows how many rows will
be going into the sort and how wide they are, ISTM it should be able to
estimate how much memory will be needed. Even if it can't, I think you
could still get by with just keeping track of how much memory a sort is
using when it asks for more memory. At some point it would hit a
threshold and would then spill to disk.

BTW, if it would be useful, I could test sort speeds for sorts that
spill to disk with different sort_mem settings. That would help answer
the question of how much it would hurt for a sort that spills to disk to
give back some of it's memory so that other sort operations wouldn't
spill to disk.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#18)
Re: changing sort_mem on the fly?

"Jim C. Nasby" <decibel@decibel.org> writes:

Since the planner knows how many rows will
be going into the sort and how wide they are, ISTM it should be able to
estimate how much memory will be needed.

... which is different from how much will be available. See cost_sort():

* If the total volume of data to sort is less than work_mem, we will do
* an in-memory sort, which requires no I/O and about t*log2(t) tuple
* comparisons for t tuples.
*
* If the total volume exceeds work_mem, we switch to a tape-style merge
* algorithm. There will still be about t*log2(t) tuple comparisons in
* total, but we will also need to write and read each tuple once per
* merge pass. We expect about ceil(log6(r)) merge passes where r is the
* number of initial runs formed (log6 because tuplesort.c uses six-tape
* merging). Since the average initial run should be about twice work_mem,
* we have
* disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem)))
* cpu = comparison_cost * t * log2(t)

The actual cost of a sort is therefore *highly* sensitive to how much
memory it is allowed to use. Blowing off any ability to estimate this
in advance is not going to improve matters...

regards, tom lane

#20Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#19)
Re: changing sort_mem on the fly?

On Sun, Jan 30, 2005 at 04:49:39PM -0500, Tom Lane wrote:

"Jim C. Nasby" <decibel@decibel.org> writes:

Since the planner knows how many rows will
be going into the sort and how wide they are, ISTM it should be able to
estimate how much memory will be needed.

... which is different from how much will be available. See cost_sort():

Ok, I wasn't sure which you were refering to. As I mentioned in the
earlier thread, there would have to be some means of accounting for how
much memory active sorts in the system are using. One possibility is
that a global counter is updated any time a sort allocates or frees
memory. If allocations are done 8k at a time that would probably be too
expensive, but I suspect it wouldn't be too bad with larger allocations
that are done less frequently. And bear in mind that if these changes
prevent even a few sorts an hour from spilling to disk then it's
probably still a net gain.

* If the total volume of data to sort is less than work_mem, we will do
* an in-memory sort, which requires no I/O and about t*log2(t) tuple
* comparisons for t tuples.
*
* If the total volume exceeds work_mem, we switch to a tape-style merge
* algorithm. There will still be about t*log2(t) tuple comparisons in
* total, but we will also need to write and read each tuple once per
* merge pass. We expect about ceil(log6(r)) merge passes where r is the
* number of initial runs formed (log6 because tuplesort.c uses six-tape
* merging). Since the average initial run should be about twice work_mem,
* we have
* disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem)))
* cpu = comparison_cost * t * log2(t)

The actual cost of a sort is therefore *highly* sensitive to how much
memory it is allowed to use. Blowing off any ability to estimate this
in advance is not going to improve matters...

Doesn't the sort code also have a provision for starting an in-memory
sort, and then realizing that it's going to exceed work_mem and going to
disk at that point?

Yes, without something as simple as work_mem it is harder to determine
how much memory you can use, though I don't think it's impossible.
First, a global variable/counter could be used to keep track of how much
memory all the active sorts estimate they'll need. That would be a
first-order guess. This could also be compared to how much memory is
actually being used, though I suspect that would require globally
tracking how much memory each sort estimates it will need and how much
it's already using. Of course there would also be some kind of policy
limiting sorts from taking over all of sort_mem; possibly a GUC,
possibly something more dynamic. Whatever it is, that gives you an
upper-bounds for how much memory would be used at a given point in time.

BTW, something that just occured to me... with the new cache management
code (ARC and whatever's going to replace it), presumably it will now be
better for PostgreSQL to primarily do caching instead of the OS. This
means you'd want to give PostgreSQL the lion's share of memory, but with
how sorts currently aquire memory doing so would be risky. Should there
be an option in 8.0 for sorts to use shared memory instead of allocating
their own from a (possibly small) pool of memory the OS has?
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"