Idea about better configuration options for sort memory
We frequently recommend to people that they increase sort_mem while
creating btree indexes. It is reasonable to have a larger setting
for that purpose, since (1) a single backend isn't going to be doing
multiple index creations in parallel (whereas complex queries could
easily be doing multiple sorts or hashes in parallel), and (2) in most
installations you won't have a large number of backends doing index
creations in parallel. So while sort_mem has to be set on the
assumption that you might need quite a few times the nominal setting,
this isn't true for index creation.
It strikes me that we ought to revise the configuration options to
reflect this fact: index creation's memory limit should be driven by
a separate parameter instead of using sort_mem.
We already have a memory-usage parameter that is larger than sort_mem,
and for exactly the same reasons sketched above. It's vacuum_mem.
VACUUM is also an operation that you don't expect to be running lots of
instances of in parallel, so it's okay for it to eat more than average
amounts of RAM.
So, what I'd like to do is make btree index creation pay attention to
vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
some more-generic name indicating that it's used for more than just
VACUUM. Any objections so far?
Now, what should we call it instead? I haven't come up with any
compelling thoughts --- the best I can do is "big_sort_mem" or
"single_sort_mem". Surely someone out there has a better idea.
BTW, does anyone want to lobby for renaming sort_mem at the same time?
Since it's used for sizing hash tables as well as sort workspace, it's
rather misnamed. I hesitate to rename it because of the potential for
confusion though. People are pretty used to the existing name.
regards, tom lane
On Sat, 31 Jan 2004, Tom Lane wrote:
So, what I'd like to do is make btree index creation pay attention to
vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
some more-generic name indicating that it's used for more than just
VACUUM. Any objections so far?
Why not create a seperate index_mem variable instead? index creation
tends to be, I think, less frequent then vacuum, so having a higher value
for index_mem then vacuum_mem may make sense ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes:
On Sat, 31 Jan 2004, Tom Lane wrote:
So, what I'd like to do is make btree index creation pay attention to
vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
some more-generic name indicating that it's used for more than just
VACUUM. Any objections so far?
Why not create a seperate index_mem variable instead? index creation
tends to be, I think, less frequent then vacuum, so having a higher value
for index_mem then vacuum_mem may make sense ...
Well, maybe. What's in the back of my mind is that we may come across
other cases besides CREATE INDEX and VACUUM that should use a "one-off"
setting. I think it'd make more sense to have one parameter than keep
on inventing new ones. For comparison, SortMem is used for quite a few
different purposes, but I can't recall anyone needing to tweak an
individual one of those purposes other than CREATE INDEX.
regards, tom lane
Tom Lane wrote:
Now, what should we call it instead? I haven't come up with any
compelling thoughts --- the best I can do is "big_sort_mem" or
"single_sort_mem". Surely someone out there has a better idea.BTW, does anyone want to lobby for renaming sort_mem at the same time?
Since it's used for sizing hash tables as well as sort workspace, it's
rather misnamed. I hesitate to rename it because of the potential for
confusion though. People are pretty used to the existing name.
Hmmm ... maybe query_work_mem and maintenance_work_mem, or something
similar?
Joe
On Sat, 31 Jan 2004, Tom Lane wrote:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
On Sat, 31 Jan 2004, Tom Lane wrote:
So, what I'd like to do is make btree index creation pay attention to
vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
some more-generic name indicating that it's used for more than just
VACUUM. Any objections so far?Why not create a seperate index_mem variable instead? index creation
tends to be, I think, less frequent then vacuum, so having a higher value
for index_mem then vacuum_mem may make sense ...Well, maybe. What's in the back of my mind is that we may come across
other cases besides CREATE INDEX and VACUUM that should use a "one-off"
setting. I think it'd make more sense to have one parameter than keep
on inventing new ones. For comparison, SortMem is used for quite a few
different purposes, but I can't recall anyone needing to tweak an
individual one of those purposes other than CREATE INDEX.
Why not a 'default_mem' parameter that auto-sets the others if not
explicitly set? note that, at least in my case, I didn't know that
sort_mem affected CREATE INDEX, only ORDER/GORUP BYs ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Joe Conway <mail@joeconway.com> writes:
Tom Lane wrote:
BTW, does anyone want to lobby for renaming sort_mem at the same time?
Since it's used for sizing hash tables as well as sort workspace, it's
rather misnamed. I hesitate to rename it because of the potential for
confusion though. People are pretty used to the existing name.
Hmmm ... maybe query_work_mem and maintenance_work_mem, or something
similar?
I'll go with these unless someone has another proposal ...
regards, tom lane
Tom Lane wrote:
Joe Conway <mail@joeconway.com> writes:
Tom Lane wrote:
BTW, does anyone want to lobby for renaming sort_mem at the same time?
Since it's used for sizing hash tables as well as sort workspace, it's
rather misnamed. I hesitate to rename it because of the potential for
confusion though. People are pretty used to the existing name.Hmmm ... maybe query_work_mem and maintenance_work_mem, or something
similar?I'll go with these unless someone has another proposal ...
The only confusion is that you can use multiple query_work_mem per
query, but I can't think of a better name.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom Lane wrote:
Joe Conway <mail@joeconway.com> writes:
Hmmm ... maybe query_work_mem and maintenance_work_mem, or something
similar?I'll go with these unless someone has another proposal ...
The only confusion is that you can use multiple query_work_mem per
query, but I can't think of a better name.
True. Maybe just "work_mem" and "maintenance_work_mem"?
BTW, I am going to look at whether GUC can be persuaded to continue to
allow "sort_mem" as an alternate name, if we rename it. That would
alleviate most of the backward-compatibility issues of changing such
a well-known parameter name.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom Lane wrote:
Joe Conway <mail@joeconway.com> writes:
Hmmm ... maybe query_work_mem and maintenance_work_mem, or something
similar?I'll go with these unless someone has another proposal ...
The only confusion is that you can use multiple query_work_mem per
query, but I can't think of a better name.True. Maybe just "work_mem" and "maintenance_work_mem"?
BTW, I am going to look at whether GUC can be persuaded to continue to
allow "sort_mem" as an alternate name, if we rename it. That would
alleviate most of the backward-compatibility issues of changing such
a well-known parameter name.
Good. It is not like we have a huge namespace limitation in there. I
wonder if we could cost it as a list of string pointers, null
terminated.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom Lane wrote:
BTW, I am going to look at whether GUC can be persuaded to continue to
allow "sort_mem" as an alternate name, if we rename it. That would
alleviate most of the backward-compatibility issues of changing such
a well-known parameter name.
Good. It is not like we have a huge namespace limitation in there. I
wonder if we could cost it as a list of string pointers, null
terminated.
After looking at the code a bit, I think the simplest solution is for
find_option to look in a separate mapping table (mapping from old to new
option name) if it doesn't find the given name in the main table. This
would make lookup of "old" names a shade slower than "preferred" names,
but that doesn't seem like a problem.
With this approach, old GUC names would be recognized in SHOW and SET
commands, as well as the other ways you can set a variable by name
(postgresql.conf, ALTER USER SET, etc). But only the new names would
appear in SHOW ALL or the pg_settings view. Does that seem OK?
regards, tom lane
Tom Lane wrote:
With this approach, old GUC names would be recognized in SHOW and SET
commands, as well as the other ways you can set a variable by name
(postgresql.conf, ALTER USER SET, etc). But only the new names would
appear in SHOW ALL or the pg_settings view. Does that seem OK?
May be raise a warning if the old GUC is used, can speed the
adoption of new GUCs variables.
Regards
Gaetano Mendola
Tom Lane wrote:
With this approach, old GUC names would be recognized in SHOW and SET
commands, as well as the other ways you can set a variable by name
(postgresql.conf, ALTER USER SET, etc). But only the new names would
appear in SHOW ALL or the pg_settings view. Does that seem OK?
Seems OK to me, in fact maybe preferred. But I wonder if we should emit
a NOTICE when old names are used with SHOW and SET commands?
Joe
Seems OK to me, in fact maybe preferred. But I wonder if we should emit
a NOTICE when old names are used with SHOW and SET commands?
A WARNING should be issued.
Chris
On Sat, 31 Jan 2004, Tom Lane wrote:
Now, what should we call it instead? I haven't come up with any
compelling thoughts --- the best I can do is "big_sort_mem" or
"single_sort_mem". Surely someone out there has a better idea.
vacuuming and indexing are not too frequent database administration tasks.
administration_mem ? admin_mem ? system_mem ? system_task_mem ?
systask_mem ? executive_mem ?
Anything may be chosen, but the point is to have a good comment within the
template configuration file.
I would nevertheless avoid "big".
What is big changes overtime with computers;-)
--
Fabien.
On Sat, 31 Jan 2004, Tom Lane wrote:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
On Sat, 31 Jan 2004, Tom Lane wrote:
So, what I'd like to do is make btree index creation pay
attention to> vacuum_mem instead of sort_mem, and rename the
vacuum_mem parameter to> some more-generic name indicating that
it's used for more than just> VACUUM. Any objections so far?Why not create a seperate index_mem variable instead? index
creation tends to be, I think, less frequent then vacuum, so
having a higher value for index_mem then vacuum_mem may make sense
...Well, maybe. What's in the back of my mind is that we may come
across other cases besides CREATE INDEX and VACUUM that should use a
"one-off" setting. I think it'd make more sense to have one
parameter than keep on inventing new ones. For comparison, SortMem
is used for quite a few different purposes, but I can't recall
anyone needing to tweak an individual one of those purposes other
than CREATE INDEX.
I don't know if this would apply here - but foriegn key creation also
benefits hugely from jacking up sort_mem and you also don't do too many
of those in parellel.
I'm guessing it would be quite in-elegant and kludgy to make that code
use the bigger pool.. it would benefit restore times though.
--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/
Tom Lane wrote:
After looking at the code a bit, I think the simplest solution is for
find_option to look in a separate mapping table (mapping from old to new
option name) if it doesn't find the given name in the main table. This
would make lookup of "old" names a shade slower than "preferred" names,
but that doesn't seem like a problem.With this approach, old GUC names would be recognized in SHOW and SET
commands, as well as the other ways you can set a variable by name
(postgresql.conf, ALTER USER SET, etc). But only the new names would
appear in SHOW ALL or the pg_settings view. Does that seem OK?
Sounds good. The idea that we can keep the names constant seems to be a
losing proposal.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Jeff <threshar@torgo.978.org> writes:
On Sat, 31 Jan 2004, Tom Lane wrote:
Well, maybe. What's in the back of my mind is that we may come
across other cases besides CREATE INDEX and VACUUM that should use a
"one-off" setting. I think it'd make more sense to have one
parameter than keep on inventing new ones.
I don't know if this would apply here - but foriegn key creation also
benefits hugely from jacking up sort_mem and you also don't do too many
of those in parellel.
I'm guessing it would be quite in-elegant and kludgy to make that code
use the bigger pool.. it would benefit restore times though.
Actually, it wouldn't be all that hard. We could make
RI_Initial_Check() do the equivalent of "SET LOCAL work_mem" before
issuing the query, and then again afterwards to restore the prior
value. This would have no permanent effect on work_mem, because the
old value would be restored by transaction abort if the check query
fails.
This seems like a good idea to me, so I'll do it unless I hear
objections.
regards, tom lane
On Mon, 2 Feb 2004, Tom Lane wrote:
Jeff <threshar@torgo.978.org> writes:
On Sat, 31 Jan 2004, Tom Lane wrote:
Well, maybe. What's in the back of my mind is that we may come
across other cases besides CREATE INDEX and VACUUM that should use a
"one-off" setting. I think it'd make more sense to have one
parameter than keep on inventing new ones.I don't know if this would apply here - but foriegn key creation also
benefits hugely from jacking up sort_mem and you also don't do too many
of those in parellel.I'm guessing it would be quite in-elegant and kludgy to make that code
use the bigger pool.. it would benefit restore times though.Actually, it wouldn't be all that hard. We could make
RI_Initial_Check() do the equivalent of "SET LOCAL work_mem" before
issuing the query, and then again afterwards to restore the prior
value. This would have no permanent effect on work_mem, because the
old value would be restored by transaction abort if the check query
fails.This seems like a good idea to me, so I'll do it unless I hear
objections.
any chance of having some kind of max_total_sort_mem setting to keep
machines out of swap storms, or would that be a nightmare to implement?
"scott.marlowe" <scott.marlowe@ihs.com> writes:
any chance of having some kind of max_total_sort_mem setting to keep
machines out of swap storms, or would that be a nightmare to implement?
I don't see any reasonable way to do that.
regards, tom lane
On Mon, 2 Feb 2004, Tom Lane wrote:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
any chance of having some kind of max_total_sort_mem setting to keep
machines out of swap storms, or would that be a nightmare to implement?I don't see any reasonable way to do that.
I didn't think there was. just hoping... :-)