Simple postgresql.conf wizard
One of the long-terms goals I'm working toward is wrapping a "wizard"
interface around the tuning guidelines described by
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server now that
those have gone through a fair amount of peer review. Attached is a first
simple version of such a wizard, implemented in Python. Right now what it
does is look the amount of memory in your system and adjust shared_buffers
and effective_cache_size. So if we started with this stock configuration:
shared_buffers = 32MB # min 128kB or
#effective_cache_size = 128MB
And run it like this (from my system with 8GB of RAM):
./pg-generate-conf /home/gsmith/data/guc/postgresql.conf
You'd get a new file with these lines in it printed to standard out:
# shared_buffers = 32MB # min 128kB or
#effective_cache_size = 128MB
shared_buffers 1931MB # pg_generate_conf wizard 2008-11-01
effective_cache_size 5793MB # pg_generate_conf wizard 2008-11-01
While I've got a pretty clear vision for what I'm doing with this next and
will kick off a pgfoundry project real soon, I wanted to throw this out as
a WIP for feedback at this point. I was ultimately hoping to one day have
something like this shipped as a contrib/ module to address the constant
requests for such a thing. I know it would be unreasonable to expect
something in this state to make it into the 8.4 contrib at this late hour.
But since it's independant of the core database stuff I figured I'd make
it available right at the wire here today in the off chance that did seem
a reasonable proposition to anybody. It is already a big improvement over
no tuning at all, and since it's a simple to change script I will rev this
based on feedback pretty fast now that the most boring guts are done.
Possible feedback topics:
-Setting the next round of values requires asking the user for some input
before making recommendations. Is it worth building a curses-based
interface to updating the values? That would be really helpful for people
with only ssh access to their server, but barring something like WCurses
I'm not sure it would help on Windows.
-How about a GUI one with Python's Tkinter interface? Now Windows isn't a
problem, but people using ssh aren't going to be as happy.
-I'm not sure if there's any useful replacement for the os.sysconf
interface I'm using to grab the memory information on the popular Windows
Python ports. Some of the other projects I looked at that tried to
abstract that OS interaction more didn't seem much better here (i.e. the
PSI library which doesn't support Windows either)
-Stepping back a bit from this particular code, is something in Python
like this ever going to be appropriate to ship as a contrib module?
There seems to be a bit more traction in this community for using Perl for
such things; I might do a Perl port of this one day but that's not going
to happen soon.
I think that's enough flametastic material now, and I do plan to join in
on patch review in penance for the disruption I've introduced here. The
next specific things I'm doing with this regardless is making it read and
respect the min/max values for settings as well as the rest of the
information avaialable from pg_settings. I eventually want to support all
the syntax suggested for pg_generate_config described at
http://wiki.postgresql.org/wiki/GUCS_Overhaul but for the moment I'm not
being so ambitious. Some of that is aimed at making a pg_generate_conf
that is capable of replacing the sample postgresql.conf file, which is a
couple of steps away from where I'm at right now.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Attachments:
pg-generate-conf-v1text/plain; CHARSET=US-ASCII; NAME=pg-generate-conf-v1Download
Greg Smith wrote:
While I've got a pretty clear vision for what I'm doing with this next
and will kick off a pgfoundry project real soon, I wanted to throw
this out as a WIP for feedback at this point. I was ultimately hoping
to one day have something like this shipped as a contrib/ module to
address the constant requests for such a thing. I know it would be
unreasonable to expect something in this state to make it into the 8.4
contrib at this late hour. But since it's independant of the core
database stuff I figured I'd make it available right at the wire here
today in the off chance that did seem a reasonable proposition to
anybody. It is already a big improvement over no tuning at all, and
since it's a simple to change script I will rev this based on feedback
pretty fast now that the most boring guts are done.
I've picked this up to look at it. Will give feedback once I have a
python installed.
-Setting the next round of values requires asking the user for some
input before making recommendations. Is it worth building a
curses-based interface to updating the values? That would be really
helpful for people with only ssh access to their server, but barring
something like WCurses I'm not sure it would help on Windows.
I'd start with command-line switches, e.g.
config --memory=32GB --type=DW --size=500GB --connections=20
-I'm not sure if there's any useful replacement for the os.sysconf
interface I'm using to grab the memory information on the popular
Windows Python ports. Some of the other projects I looked at that
tried to abstract that OS interaction more didn't seem much better
here (i.e. the PSI library which doesn't support Windows either)
I think in initial versions we should just get the info from the admin.
I've explored the issues around getting OS-independant accurate system
stats, and they are many.
-Stepping back a bit from this particular code, is something in Python
like this ever going to be appropriate to ship as a contrib module?
There seems to be a bit more traction in this community for using Perl
for such things; I might do a Perl port of this one day but that's not
going to happen soon.
If you do a python version, others will write versions in other
languages. I personally don't really care; Perl's main advantage is
that it's pre-installed on more OSes than Python is.
Anyway, more feedback once I get it running.
--Josh
On Sun, 2 Nov 2008, Josh Berkus wrote:
I'd start with command-line switches, e.g.
config --memory=32GB --type=DW --size=500GB --connections=20
That seems reasonable, I think I'll push a fancier UI on the backburner
then and just spec out an options interface like this one.
I think in initial versions we should just get the info from the admin. I've
explored the issues around getting OS-independant accurate system stats, and
they are many.
I'm aware how thorny a perfect solution is here. One thing that's nice
about Python is that there are two interfaces for getting system
information built-in, the os.sysconf I used already and the the
distutils.sysconfig one, aimed more at C-level module writers. Far as
I've been able to tell it's not that hard to get something that works on
most platforms to auto-detect memory and architecture, and I've realized
the latter is kind of important because it determines how big you can make
some things on 32-bit platforms.
After some digging I see there isn't any good interface built-in for
Windows, but it's not hard to ask a DDL for the information. I think it's
reasonable to try and detect total memory+bit width, allow overriding
that, and if detection fails and nothing was specified to error out.
Should make a good balance of reasonable automatic behavior in a lot of
cases, while still allowing the admin to adjust. There's a completely
valid use-case for that even not considering detection failures, where
you're building a config file on a system other than the one it's being
deployed onto.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Sun, 2008-11-02 at 17:10 -0800, Josh Berkus wrote:
Greg Smith wrote:
-Stepping back a bit from this particular code, is something in Python
like this ever going to be appropriate to ship as a contrib module?
I think python should be as good as any relatively widespread language.
Unless we require all contrib to be in C.
There seems to be a bit more traction in this community for using Perl
for such things; I might do a Perl port of this one day but that's not
going to happen soon.
The advantage of python over perl is that anybody who knows at least one
programming language is able to learn basic python in an afternoon.
If you do a python version, others will write versions in other
languages.
Yeah, if python is not accepted as contrib, then it can probably be
rewritten in C once it has stabilized enough.
I personally don't really care; Perl's main advantage is
that it's pre-installed on more OSes than Python is.
I think most (if not all) modern OS's standard setup includes both perl
and python. Except of course windows which probably includes neither.
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training
On Sat, 2008-11-01 at 22:02 -0400, Greg Smith wrote:
Possible feedback topics:
-Setting the next round of values requires asking the user for some input
before making recommendations. Is it worth building a curses-based
interface to updating the values? That would be really helpful for people
with only ssh access to their server, but barring something like WCurses
I'm not sure it would help on Windows.-How about a GUI one with Python's Tkinter interface? Now Windows isn't a
problem, but people using ssh aren't going to be as happy.
You could have it in two parts - an agent running on the server, started
over ssh and GUI interface running on users workstation which talk to
said agent.
And I'd suggest you use wxPython for GUI part if you want a relatively
modern look.
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training
Greg Smith wrote:
One of the long-terms goals I'm working toward is wrapping a "wizard"
interface around the tuning guidelines described by
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server now that
those have gone through a fair amount of peer review. Attached is a
first simple version of such a wizard, implemented in Python. Right now
what it does is look the amount of memory in your system and adjust
shared_buffers and effective_cache_size. So if we started with this
stock configuration:
Thats really nice. I have been playing with that idea also (and in
python) but not really having time to do more then some simple tests.
So its nice to see we have something to base development on.
Thx for starting!
Regards
Tino
Greg Smith wrote:
On Sun, 2 Nov 2008, Josh Berkus wrote:
I'd start with command-line switches, e.g.
config --memory=32GB --type=DW --size=500GB --connections=20That seems reasonable, I think I'll push a fancier UI on the backburner
then and just spec out an options interface like this one.I think in initial versions we should just get the info from the
admin. I've explored the issues around getting OS-independant
accurate system stats, and they are many.I'm aware how thorny a perfect solution is here. One thing that's nice
about Python is that there are two interfaces for getting system
information built-in, the os.sysconf I used already and the the
distutils.sysconfig one, aimed more at C-level module writers. Far as
I've been able to tell it's not that hard to get something that works on
most platforms to auto-detect memory and architecture, and I've realized
the latter is kind of important because it determines how big you can
make some things on 32-bit platforms.After some digging I see there isn't any good interface built-in for
Windows, but it's not hard to ask a DDL for the information. I think
it's reasonable to try and detect total memory+bit width, allow
overriding that, and if detection fails and nothing was specified to
error out. Should make a good balance of reasonable automatic behavior
in a lot of cases, while still allowing the admin to adjust. There's a
completely valid use-case for that even not considering detection
failures, where you're building a config file on a system other than the
one it's being deployed onto.
Let me know if you need help to write whatever's needed to detect that
stuff on Windows. It shouldn't be too hard, as long as you can live with
a couple of lines of win32-specific code.. :-)
//Magnus
Hannu Krosing wrote:
If you do a python version, others will write versions in other
languages.Yeah, if python is not accepted as contrib, then it can probably be
rewritten in C once it has stabilized enough.
It could. The question is if it makes sense to write something like this
in C, really ;) It might get slightly more portable, at the expense of a
lot more work.
I see no reason why we should on principle reject a python based program
from contrib. We already have stuff there in shellscript which is
actually *less* portable... As long as it's not a core utility needed to
get postgresql working, I think it's fine.
I personally don't really care; Perl's main advantage is
that it's pre-installed on more OSes than Python is.I think most (if not all) modern OS's standard setup includes both perl
and python. Except of course windows which probably includes neither.
Windows ships with neither of the two languages (and you *really* don't
want to write it in vbscript or jscript which is what it does ship with
- other than .BAT). But they both have easy installers you can use to
get it in there - I don't see this as any difference between the two.
And I'll second the comment that I think most reasonably modern
platforms will ship with both of them. AFAIK, many of the newer linuxen
use python based stuff as part of the core installer functionality, for
example.
//Magnus
On 2008-11-02, at 02:02, Greg Smith wrote:
Possible feedback topics:
-Setting the next round of values requires asking the user for some
input before making recommendations. Is it worth building a curses-
based interface to updating the values? That would be really
helpful for people with only ssh access to their server, but barring
something like WCurses I'm not sure it would help on Windows.-How about a GUI one with Python's Tkinter interface? Now Windows
isn't a problem, but people using ssh aren't going to be as happy.
create interface , and module/library - that doesn't 1:1 reflect GUI -
and you'll be able to create interface in any language/toolkit.
Idea's great, I certainly know many ppl would love to see something
like that. Cos so far, whole 'adjust setting in postgresql.conf' thing
is a guess work. It would be even more good to put in some work into
auto tuning of pgsql itself, with manual override in
postgresql.conf :) (kinda like mssql).
Hannu Krosing escreveu:
On Sat, 2008-11-01 at 22:02 -0400, Greg Smith wrote:
Possible feedback topics:
-Setting the next round of values requires asking the user for some input
before making recommendations. Is it worth building a curses-based
interface to updating the values? That would be really helpful for people
with only ssh access to their server, but barring something like WCurses
I'm not sure it would help on Windows.-How about a GUI one with Python's Tkinter interface? Now Windows isn't a
problem, but people using ssh aren't going to be as happy.You could have it in two parts - an agent running on the server, started
over ssh and GUI interface running on users workstation which talk to
said agent.And I'd suggest you use wxPython for GUI part if you want a relatively
modern look.
Or the GUI could be a part of pgAdmin.
--
[]s
Dickson S. Guedes
Administrador de Banco de Dados
Projeto Colmeia - Florian�polis, SC
(48) 3322-1185, ramal: 26
http://planeta.postgresql.org.br
http://makeall.wordpress.com
On Mon, 3 Nov 2008, Hannu Krosing wrote:
And I'd suggest you use wxPython for GUI part if you want a relatively
modern look.
wxPython is GPL licensed and not popular enough to be available on a lot
of systems already. I've spent enough time fighting with installing
wxWidgets (for pgAdmin) to know that I really don't want to push that
install onto end-users, and due to the license it's impractical to just
bundle in some situations.
There's no need for a fancy look here anyway, the only one I ever
envisioned was a simple grid showing the recommendations the user could
update before accepting. In general here, if it doesn't ship with the
stock Python, there would have to be a really, really compelling reason to
use any external library that adds more dependencies.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, 2008-11-03 at 12:42 -0500, Greg Smith wrote:
On Mon, 3 Nov 2008, Hannu Krosing wrote:
And I'd suggest you use wxPython for GUI part if you want a relatively
modern look.wxPython is GPL licensed and not popular enough to be available on a lot
of systems already.
Wikipedia says thus about wxPython
----------------------------------
Being a wrapper, wxPython uses the same free software licence used by
wxWidgets (LGPL) —which is approved by Free Software Foundation and Open
Source Initiative.
and wxWindows itself seems to be even more relaxed
--------------------------------------------------
http://www.wxwidgets.org/about/newlicen.htm
wxWidgets is currently licensed under the "wxWindows Licence" pending
approval of the "wxWidgets Licence" which will be identical apart from
the name.
The wxWindows Licence is essentially the L-GPL (Library General Public
Licence), with an exception stating that derived works in binary form
may be distributed on the user's own terms. This is a solution that
satisfies those who wish to produce GPL'ed software using wxWidgets, and
also those producing proprietary software.
I've spent enough time fighting with installing
wxWidgets (for pgAdmin) to know that I really don't want to push that
install onto end-users, and due to the license it's impractical to just
bundle in some situations.
I don't think that GPL forbids bundling.
There's no need for a fancy look here anyway, the only one I ever
envisioned was a simple grid showing the recommendations the user could
update before accepting. In general here, if it doesn't ship with the
stock Python, there would have to be a really, really compelling reason to
use any external library that adds more dependencies.
I think that though tkInter is kind-of included wit the standard python
distribution, it is not always installed by default, or more exactly,
tcl/tk is often not installed.
In that regard I think that using curses is safest bet.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
--
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training
On Sun, 2 Nov 2008, Josh Berkus wrote:
I'd start with command-line switches, e.g.
config --memory=32GB --type=DW --size=500GB --connections=20
Attached version takes all its input via command line switches. If you
don't specify an explict number of connections, it also implements setting
max_connections via some of the logic from your calcfactors spreadsheet.
I think using more of the ideas from there will drive me crazy unless I
put that into a text file it pulls in instead of a bunch of if statements.
I'm not sure if that much of the complexity of that larger model is
warranted at this point; could use a second opinion on that.
Here's the syntax it accepts now:
$ ./pg-generate-conf --help
Usage: pg-generate-conf [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-i INPUTCONFIG, --input-config=INPUTCONFIG
Input configuration file
-o OUTPUTCONFIG, --output-config=OUTPUTCONFIG
Output configuration file, defaults to standard output
-M TOTALMEMORY, --memory=TOTALMEMORY
Total system memory, will attempt to detect if
unspecified
-T DBTYPE, --type=DBTYPE
Database type, defaults to Mixed, valid options are
DSS, OLTP, Web, Mixed, Desktop
-c CONNECTIONS, --connections=CONNECTIONS
Maximum number of expected connections, default
depends on database type
-D, --debug Enable debugging mode
I've realized I need to duplicate all the memory setting parsing stuff
from the GUC code (as well as the code in SHOW that picks a reasonable
output size) as my next step here, that will be in my next update to this
program.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Attachments:
pg-generate-config-v2text/plain; CHARSET=US-ASCII; NAME=pg-generate-config-v2Download
Greg,
Attached version takes all its input via command line switches. If you
don't specify an explict number of connections, it also implements
setting max_connections via some of the logic from your calcfactors
spreadsheet.
OK, I'll review. What follows is a review of the *previous* version,
because I'm currently on the road and didn't see your message to
-hackers. Some of the information in the review will still be relevant;
for one thing, I've simplified the "what color is your application"
logic to a few calculations.
----------------------------
Review of simple_config.py:
1) don't bother with os.sysconf, or make it optional and error-trap it.
Instead, solicit the following information from the user:
-- Available RAM
-- Expected Database Size (to nearest 10x)
-- Type of Application
-- Web
-- Data Warehouse
-- Mixed
-- Desktop
-- Operating System [Linux/Windows/OSX/Solaris/FreeBSD/other]
From the above, you can derive all necessary calculations for the
basics. In the advanced version, we'll also want to ask:
-- Memory used by other applications on the system?
-- Analyze queries for performance?
-- SSL?
-- Production vs. Development status?
-- How many connections?
-- Logging setup:
Syslog
Analyze Performance
Private log with weekly rotation
2) It's completely unnecessary to account for OS overhead. This can and
should be taken into account as part of the calculations for other
figures. For example, my 1/4 and 3/4 calculations ignore OS overhead.
You only need to reduce Available RAM when the server will be running
something else, like a J2EE server or multiple databases.
3) You need to provide a whole bunch more values. shared_buffers and
effective_cache_size isn't nearly enough. We should also provide, based
on these calculations, and by database type.
(I'm happy to argue out the figures below. They are simply based on my
direct turning experience with a variety of databases and could probably
use more tweaking for the general case.)
web / oltp
listen_addresses = '*'
max_connections = 200
shared_buffers = 1/4 AvRAM
effective_cache_size = 3/4 AvRAM
work_mem = AvRAM / max_connections, round down
maint_work_mem = AvRAM / 16, round up
wal_buffers = 8mb
autovacuum = on
max_fsm_pages = DBsize / PageSize / 8
checkpoint_segments = 8
default_statistics_target = 10
constraint_exclusion = off
DW:
listen_addresses = '*'
max_connections = 20
shared_buffers = 1/4 AvRAM
effective_cache_size = 3/4 AvRAM
work_mem = AvRAM / max_connections / 2, round down
maint_work_mem = AvRAM / 8, round up
wal_buffers = 32mb
autovacuum = off
max_fsm_pages = DBsize / PageSize / 32*
(unless LoadSize is known)
checkpoint_segments = 64
default_statistics_target = 400
constraint_exclusion = on
Mixed:
listen_addresses = '*'
max_connections = 80
shared_buffers = 1/4 AvRAM
effective_cache_size = 3/4 AvRAM
work_mem = AvRAM / max_connections / 2, round down
maint_work_mem = AvRAM / 16, round up
wal_buffers = 8mb
autovacuum = on
max_fsm_pages = DBsize / PageSize / 8
checkpoint_segments = 16
default_statistics_target = 100
constraint_exclusion = on
Desktop:
listen_addresses = 'localhost'
max_connections = 5
shared_buffers = 1/16 AvRAM
effective_cache_size = 1/4 AvRAM
work_mem = AvRAM / 32, round down
maint_work_mem = AvRAM / 16, round up
wal_buffers = 1mb
autovacuum = on
max_fsm_pages = DBsize / PageSize / 8
checkpoint_segments = 3
default_statistics_target = 10
constraint_exclusion = off
4) Because this comes up so often, we should output to a seperate file
a set of sysctl.conf lines to support SysV memory, depending on OS.
Josh Berkus <josh@agliodbs.com> writes:
DW:
default_statistics_target = 400
Mixed:
default_statistics_target = 100
You, my friend, are certifiably insane.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark wrote:
Josh Berkus <josh@agliodbs.com> writes:
DW:
default_statistics_target = 400
Mixed:
default_statistics_target = 100You, my friend, are certifiably insane.
Hmmm? Why? I've used those settings in the field, fairly frequently.
I was actually wondering if we should raise the default for web as well,
but decided to let it alone.
Actually, I think a DW should begin at 400; often it needs to go up to
1000, but I don't think a script should do that.
--Josh
Greg,
BTW, I think this is still in enough flux that we really ought to make
it a pgfoundry project. I don't think we'll have anything ready for 8.4
contrib.
--Josh
Josh Berkus <josh@agliodbs.com> writes:
Gregory Stark wrote:
Josh Berkus <josh@agliodbs.com> writes:
DW:
default_statistics_target = 400
Mixed:
default_statistics_target = 100You, my friend, are certifiably insane.
Hmmm? Why? I've used those settings in the field, fairly frequently.
Even though we all agree default_statistics_target = 10 is too low,
proposing a 40X increase in the default value requires more evidence
than this. In particular, the prospect of a 1600-fold increase in
the typical cost of eqjoinsel() is a mite scary.
regards, tom lane
A statistic target of 400 fir a specific column may make sense but
even then I would recommend monitoring performance to ensure it
doesn't cause problems. As a global setting it's, IMHO, ridiculous.
Even for the smaller data types (except boolean and "char") and array
of 400 will be large enough to be toasted. Planning queries will
involve many more disk I/Os than some of those queries end up taking
themselves. Even for stats which are already cached there are some
algorithms in the planner known to be inefficient for large arrays.
It may make sense for specific skewed columns with indexes on them,
but keep in mind postgres needs to consult the statistics on any
column referenced in a qual even if there are no indexes and for most
data distributions do fine with a target of 10.
I think we all agree the default may need to be raised but until there
is some data we have little basis to recommend anything specific.
I would suggest starting from the basis that "mixed" (with a
conservative memory setting) is the same as "Postgres default".
Perhaps (probably) the defaults should be changed but we shouldn't
have two different tools with different (drastically different!) ideas
for the same situation.
greg
On 13 Nov 2008, at 07:46 PM, Josh Berkus <josh@agliodbs.com> wrote:
Show quoted text
Gregory Stark wrote:
Josh Berkus <josh@agliodbs.com> writes:
DW:
default_statistics_target = 400
Mixed:
default_statistics_target = 100You, my friend, are certifiably insane.
Hmmm? Why? I've used those settings in the field, fairly
frequently. I was actually wondering if we should raise the default
for web as well, but decided to let it alone.Actually, I think a DW should begin at 400; often it needs to go up
to 1000, but I don't think a script should do that.--Josh
Gregory Stark wrote:
Josh Berkus <josh@agliodbs.com> writes:
DW:
default_statistics_target = 400
Mixed:
default_statistics_target = 100You, my friend, are certifiably insane.
I almost fell off the chair because of that comment, but after I stopped
laughing and actually looked at those values, it doesn't seem that
unreasonable. Arbitrary, sure, but not insane. Or do I need stronger
glasses?
A lot of people have suggested raising our default_statistics target,
and it has been rejected because there's some O(n^2) behavior in the
planner, and it makes ANALYZE slower, but it's not that crazy.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com