PostgreSQL Performance Tuning
Dear,
We are facing performance tuning problem while using PostgreSQL Database
over the network on a linux OS.
Our Database consists of more than 500 tables with an average of 10K
records per table with an average of 20 users accessing the database
simultaneously over the network. Each table has indexes and we are
querying the database using Hibernate.
The biggest problem is while insertion, updating and fetching of records,
ie the database performance is very slow. It take a long time to respond
in the above scenario.
Please provide me with the tuning of the database. I am attaching my
postgresql.conf file for the reference of our current configuration
Please replay me ASAP
Regards,
Shohab Abdullah
Software Engineer,
Manufacturing SBU-POWAI
Larsen and Toubro Infotech Ltd.| 4th floor, L&T Technology Centre,
Saki Vihar Road, Powai, Mumbai-400072
(: +91-22-67767366 | (: +91-9870247322
Visit us at : http://www.lntinfotech.com
”I cannot predict future, I cannot change past, I have just the present
moment, I must treat it as my last"
----------------------------------------------------------------------------------------
The information contained in this email has been classified:
[ X] L&T Infotech General Business
[ ] L&T Infotech Internal Use Only
[ ] L&T Infotech Confidential
[ ] L&T Infotech Proprietary
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.
______________________________________________________________________
Attachments:
Shohab Abdullah wrote:
Dear,
We are facing performance tuning problem while using PostgreSQL Database
over the network on a linux OS.
Our Database consists of more than 500 tables with an average of 10K
records per table with an average of 20 users accessing the database
simultaneously over the network. Each table has indexes and we are
querying the database using Hibernate.
The biggest problem is while insertion, updating and fetching of
records, ie the database performance is very slow. It take a long time
to respond in the above scenario.
Please provide me with the tuning of the database. I am attaching my
*postgresql.conf* file for the reference of our current configuration
Have you changed _anything_ from the defaults? The defaults are set so
PG will run on as many installations as practical. They are not set for
performance - that is specific to your equipment, your data, and how you
need to handle the data. Assuming the record sizes aren't huge, that's
not a very large data set nor number of users.
Look at these for starters:
http://www.varlena.com/GeneralBits/Tidbits/perf.html
http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html
You might try setting the logging parameters to log queries longer than
"x" (where x is in milliseconds - you will have to decide the
appropriate value for "too long") and start looking into those first.
Make sure that you are running "analyze" if it is not being run by
autovacuum.
Use "EXPLAIN <your query>" to see how the query is being planned - as a
first-pass assume that on any reasonably sized table the words
"sequential scan" means "fix this". Note that you may have to cast
variables in a query to match the variable in an index in order for the
planner to figure out that it can use the index.
Read the guidelines then take an educated stab at some settings and see
how they work - other than turning off fsync, there's not much in
postgresql.conf that will put your data at risk.
Cheers,
Steve
Steve Crawford wrote:
Have you changed _anything_ from the defaults? The defaults are set so
PG will run on as many installations as practical. They are not set for
performance - that is specific to your equipment, your data, and how you
need to handle the data.
Is this really the sensible thing to do? I know we should not encourage
the world we're leaving in even more in the ways of "have the computer
do everything for us so that we don't need to have even a clue about what
we're doing" ... But, wouldn't it make sense that the configure script
determines the amount of physical memory and perhaps even do a HD
speed estimate to set up defaults that are closer to a
performance-optimized
configuration?
Then, perhaps command switches so that you could specify the type of
access you estimate for your system. Perhaps something like:
./configure --db-size=100GB --write-percentage=20 .... etc.
(switch write-percentage above indicates that we estimate that 20% of
the DB activity would be writing to the disk --- there may be other
switches to indicate the percentage of queries that are transactions,
the percentage of queries that are complex; percentage that require
index usage, etc. etc. etc.)... And then, based on that, a better set of
defaults could be set by the configuration script.
Does this make sense? Or perhaps I'm watching too much science
fiction?
Carlos
--
Carlos Moreno <moreno_pg@mochima.com> writes:
... But, wouldn't it make sense that the configure script
determines the amount of physical memory and perhaps even do a HD
speed estimate to set up defaults that are closer to a
performance-optimized
configuration?
No. Most copies of Postgres these days are executed on machines very
far away from where the code was built. It's a little bit safer to
try to tune things at initdb time ... as indeed we already do. But
the fundamental problem remains that we don't know that much about
how the installation will be used. For example, the planner
configuration parameters turn out to have not that much to do with the
absolute speed of your drive, and a whole lot to do with the ratio
of the size of your database to the amount of RAM you've got; and the
ultimate size of the DB is one thing initdb certainly can't guess.
Also, there is an extremely good reason why Postgres will never be set
up to try to take over the whole machine by default: most of the
developers run multiple postmasters on their machines.
regards, tom lane
Tom Lane wrote:
Carlos Moreno <moreno_pg@mochima.com> writes:
... But, wouldn't it make sense that the configure script
determines the amount of physical memory and perhaps even do a HD
speed estimate to set up defaults that are closer to a
performance-optimized
configuration?No. Most copies of Postgres these days are executed on machines very
far away from where the code was built. It's a little bit safer to
try to tune things at initdb time ... as indeed we already do.
D'oh! Yes, that makes more sense, of course.
But
the fundamental problem remains that we don't know that much about
how the installation will be used.
Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on how
the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will be
transactions,
percentage that will use indexes, percentage of queries that will be
complex,
etc. etc.
Wouldn't initdb be able to do a better job at coming up with sensible
defaults if it counts on this information? Of course, all these
parameters
would have their own defaults --- the user won't necessarily know or have
an accurate estimate for each and every one of them.
Also, there is an extremely good reason why Postgres will never be set
up to try to take over the whole machine by default: most of the
developers run multiple postmasters on their machines.
Wouldn't this be covered by the above suggestion?? One of the switches
for the command initdb could allow the user to specify how many instances
will be run (I assume you're talking about having different instances
listening
on different ports for increased concurrency-related benefits?)
Does my suggestion make more sense now? Or is it still too unrealistic to
make it work properly/safely?
Carlos
--
On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on how
the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will be
transactions,
percentage that will use indexes, percentage of queries that will be
complex,
etc. etc.
If the person knows all that, why wouldn't they know to just change the
config parameters?
Mike Stone
Carlos Moreno <moreno_pg@mochima.com> writes:
Tom Lane wrote:
But
the fundamental problem remains that we don't know that much about
how the installation will be used.
Notice that the second part of my suggestion covers this --- have
additional switches to initdb
That's been proposed and rejected before, too; the main problem being
that initdb is frequently a layer or two down from the user (eg,
executed by initscripts that can't pass extra arguments through, even
assuming they're being invoked by hand in the first place).
regards, tom lane
Maybe he's looking for a switch for initdb that would make it
interactive and quiz you about your expected usage-- sort of a magic
auto-configurator wizard doohicky? I could see that sort of thing being
nice for the casual user or newbie who otherwise would have a horribly
mis-tuned database. They could instead have only a marginally mis-tuned
database :)
Show quoted text
On Fri, 2007-04-27 at 10:30 -0400, Michael Stone wrote:
On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on how
the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will be
transactions,
percentage that will use indexes, percentage of queries that will be
complex,
etc. etc.If the person knows all that, why wouldn't they know to just change the
config parameters?Mike Stone
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
Hello.
Just my 2 cents, and not looking to the technical aspects:
setting up PSQL is the weakest point of PSQL as we have experienced ourself,
once it is running it is great.
I can imagine that a lot of people of stops after their first trials after
they have
experienced the troubles and performance of a standard set up.
This is ofcourse a lost user forever.
So anything that could be done to get an easier and BETTER setup would
strongly enhance PSQL.
My 2 cents.
Henk Sanders
-----Oorspronkelijk bericht-----
Van: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]Namens Tom Lane
Verzonden: vrijdag 27 april 2007 16:37
Aan: Carlos Moreno
CC: PostgreSQL Performance
Onderwerp: Re: [PERFORM] Feature Request --- was: PostgreSQL Performance
Tuning
Carlos Moreno <moreno_pg@mochima.com> writes:
Tom Lane wrote:
But
the fundamental problem remains that we don't know that much about
how the installation will be used.
Notice that the second part of my suggestion covers this --- have
additional switches to initdb
That's been proposed and rejected before, too; the main problem being
that initdb is frequently a layer or two down from the user (eg,
executed by initscripts that can't pass extra arguments through, even
assuming they're being invoked by hand in the first place).
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On Fri, Apr 27, 2007 at 07:36:52AM -0700, Mark Lewis wrote:
Maybe he's looking for a switch for initdb that would make it
interactive and quiz you about your expected usage-- sort of a magic
auto-configurator wizard doohicky? I could see that sort of thing being
nice for the casual user or newbie who otherwise would have a horribly
mis-tuned database. They could instead have only a marginally mis-tuned
database :)
However you implement it, anyone who can answer all of those questions
is probably capable of reading and understanding the performance section
in the manual.
It's probably more practical to have a seperate script that looks at the
running system (ram, disks, pg config, db size, indices, stats, etc.)
and makes suggestions--if someone wants to write such a thing.
Mike Stone
On Apr 27, 2007, at 3:30 PM, Michael Stone wrote:
On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on
how the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will
be transactions,
percentage that will use indexes, percentage of queries that will
be complex,
etc. etc.If the person knows all that, why wouldn't they know to just change
the config parameters?
Because knowing your expected workload is a lot easier for many
people than knowing what every GUC does.
Personally, I think it would be a tremendous start if we just
provided a few sample configs like MySQL does. Or if someone wanted
to get fancy they could stick a web page somewhere that would produce
a postgresql.conf based simply on how much available RAM you had,
since that's one of the biggest performance-hampering issues we run
into (ie: shared_buffers left at the default of 32MB).
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Michael Stone wrote:
On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
Notice that the second part of my suggestion covers this --- have
additional
switches to initdb
<snip>
If the person knows all that, why wouldn't they know to just change the
config parameters?
Exactly.. What I think would be much more productive is to use the great amount
of information that PG tracks internally and auto-tune the parameters based on
it. For instance:
Why does the user need to manually track max_fsm_pages and max_fsm_relations? I
bet there are many users who have never taken the time to understand what this
means and wondering why performance still stinks after vacuuming their database
( spoken from my own experience )
How about work_mem? shared_buffers? column statistics sizes? random_page_cost?
Couldn't some fairly simple regression tests akin to a VACUUM process spot
potential problems? "Hey, it looks like you need more fsm_relations.. I bumped
that up automatically for you". Or "These indexes look bloated, shall I
automatically reindex them for you?"
I'm sure there are many more examples, that with some creative thinking, could
be auto-adjusted to match the usage patterns of the database. PG does an
excellent job of exposing the variables to the users, but mostly avoids telling
the user what to do or doing it for them. Instead, it is up to the user to know
where to look, what to look for, and how to react to things to improve
performance. This is not all bad, but it is assuming that all users are hackers
( which used to be true ), but certainly doesn't help when the average SQLServer
admin tries out Postgres and then is surprised at the things they are now
responsible for managing. PG is certainly *not* the only database to suffer
from this syndrome, I know..
I like to think of my systems as good employees. I don't want to have to
micromanage everything they do. I want to tell them "here's what I want done",
and assuming I made a good hiring choice, they will do it and take some liberty
to adjust parameters where needed to achieve the spirit of the goal, rather than
blindly do something inefficiently because I failed to explain to them the
absolute most efficient way to accomplish the task.
Granted, there are some people who don't like the developers making any
assumptions about their workload. But this doesn't have to be an either/or
proposition. I don't think any control needs to be abandoned. But
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
the patch" ). I just don't know if this feeling has resonated well between new
users and long-term developers. I know it must be grating to have to answer the
same questions over and over and over "have you analyzed? Did you leave
postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO.
In closing, I am not bashing PG! I love it and swear by it. These comments are
purely from an advocacy perspective. I'd love to see PG user base continue to grow.
My .02
-Dan
Dan,
Exactly.. What I think would be much more productive is to use the
great amount of information that PG tracks internally and auto-tune the
parameters based on it. For instance:
*Everyone* wants this. The problem is that it's very hard code to write
given the number of variables. I'm working on it but progress is slow,
due to my travel schedule.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
In response to Dan Harris <fbsd@drivefaster.net>:
Michael Stone wrote:
On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
Notice that the second part of my suggestion covers this --- have
additional
switches to initdb<snip>
If the person knows all that, why wouldn't they know to just change the
config parameters?Exactly.. What I think would be much more productive is to use the great amount
of information that PG tracks internally and auto-tune the parameters based on
it. For instance:Why does the user need to manually track max_fsm_pages and max_fsm_relations? I
bet there are many users who have never taken the time to understand what this
means and wondering why performance still stinks after vacuuming their database
( spoken from my own experience )
But there are two distinct routes that can be taken if there's not enough
fsm space: add fsm space or vacuum more frequently. I don't want the system
to eat up a bunch of memory for fsm entries if my workload indicates that
I can easily vacuum more frequently.
How about work_mem? shared_buffers? column statistics sizes? random_page_cost?
The only one that seems practical (to me) is random_page_cost. The others are
all configuration options that I (as a DBA) want to be able to decide for
myself. For example, I have some dedicated PG servers that I pretty much
max those values out at, to let PG know that it can use everything on the
system -- but I also have some shared use machines with PG, where I carefully
constrain those values so that PG doesn't muscle other daemons out of their
share of the RAM (work_mem is probably the best example)
It would be nice to have some kind of utility that could tell me what
random_page_cost should be, as I've never felt comfortable tweaking it.
Like some utility to run that would say "based on the seek tests I just
ran, you should set random_page_cost to x". Of course, if such a thing
existed, it could just fill in the value for you. But I haven't figured
out how to pick a good value for that setting, so I have no idea how to
suggest to have it automatically set.
Couldn't some fairly simple regression tests akin to a VACUUM process spot
potential problems? "Hey, it looks like you need more fsm_relations.. I bumped
that up automatically for you". Or "These indexes look bloated, shall I
automatically reindex them for you?"
A lot of that stuff does happen. A vacuum verbose will tell you what it
thinks you should do, but I don't _want_ it to do it automatically. What
if I create huge temporary tables once a week for some sort of analysis that
overload the fsm space? And if I'm dropping those tables when the analysis
is done, do I want the fsm space constantly adjusting?
Plus, some is just impossible. shared_buffers requires a restart. Do you
want your DB server spontaneously restarting because it thought more
buffers might be nice?
I'm sure there are many more examples, that with some creative thinking, could
be auto-adjusted to match the usage patterns of the database. PG does an
excellent job of exposing the variables to the users, but mostly avoids telling
the user what to do or doing it for them. Instead, it is up to the user to know
where to look, what to look for, and how to react to things to improve
performance. This is not all bad, but it is assuming that all users are hackers
( which used to be true ), but certainly doesn't help when the average SQLServer
admin tries out Postgres and then is surprised at the things they are now
responsible for managing. PG is certainly *not* the only database to suffer
from this syndrome, I know..
I expect the suffering is a result of the fact that databases are non-trivial
pieces of software, and there's no universally simple way to set them up
and make them run well.
I like to think of my systems as good employees. I don't want to have to
micromanage everything they do. I want to tell them "here's what I want done",
and assuming I made a good hiring choice, they will do it and take some liberty
to adjust parameters where needed to achieve the spirit of the goal, rather than
blindly do something inefficiently because I failed to explain to them the
absolute most efficient way to accomplish the task.
That's silly. No software does that. You're asking software to behave like
humans. If that were the case, this would be Isaac Asimov's world, not the
real one.
Granted, there are some people who don't like the developers making any
assumptions about their workload. But this doesn't have to be an either/or
proposition. I don't think any control needs to be abandoned. But
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
the patch" ). I just don't know if this feeling has resonated well between new
users and long-term developers. I know it must be grating to have to answer the
same questions over and over and over "have you analyzed? Did you leave
postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO.
Well, it seems like this is happening where it's practical -- autovacuum is
a good example.
Personally, I wouldn't be opposed to more automagic stuff, just as long as
I have the option to disable it. There are some cases where I still
disable autovac.
In closing, I am not bashing PG! I love it and swear by it. These comments are
purely from an advocacy perspective. I'd love to see PG user base continue to grow.
I expect that part of the problem is "who's going to do it?"
--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/
wmoran@collaborativefusion.com
Phone: 412-422-3463x4023
At 10:36a -0400 on 27 Apr 2007, Tom Lane wrote:
That's been proposed and rejected before, too; the main problem being
that initdb is frequently a layer or two down from the user (eg,
executed by initscripts that can't pass extra arguments through, even
assuming they're being invoked by hand in the first place).
And following after Dan Harris' response . . .
So what's the problem with having some sort of cronjob contrib module
that utilizes the actual and current statistics to make
recommendations? I don't think it'd be right to simply change the
configuration options as it sees fit (especially as it was pointed
out that many run multiple postmasters or have other uses for the
machines in question), but perhaps it could send a message (email?)
along the lines of "Hey, I'm currently doing this many of X
transactions, against this much of Y data, and working under these
constraints. You might get better performance (in this area ... ) if
you altered the the configurations options like so: ..."
Certainly not for the masters, but perhaps for standard installation
sort of deals, sort of liking bringing up the rear . . . just a thought.
Kevin
On Fri, Apr 27, 2007 at 02:40:07PM -0400, Kevin Hunter wrote:
out that many run multiple postmasters or have other uses for the
machines in question), but perhaps it could send a message (email?)
along the lines of "Hey, I'm currently doing this many of X
transactions, against this much of Y data, and working under these
constraints. You might get better performance (in this area ... ) if
you altered the the configurations options like so: ..."
or storing the values in the db for later trending analysis, witness
ora statspack.
Bill,
The only one that seems practical (to me) is random_page_cost. The
others are all configuration options that I (as a DBA) want to be able
to decide for myself.
Actually, random_page_cost *should* be a constant "4.0" or "3.5", which
represents the approximate ratio of seek/scan speed which has been
relatively constant across 6 years of HDD technology. The only reason we
make it a configuration variable is that there's defects in our cost model
which cause users to want to tinker with it.
Mind you, that's gotten better in recent versions as well. Lately I mostly
tinker with effective_cache_size and the various cpu_* stats rather than
modifying random_page_cost.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
Bill Moran wrote:
In response to Dan Harris <fbsd@drivefaster.net>:
<snip>
Why does the user need to manually track max_fsm_pages and max_fsm_relations? I
bet there are many users who have never taken the time to understand what this
means and wondering why performance still stinks after vacuuming their database
( spoken from my own experience )But there are two distinct routes that can be taken if there's not enough
fsm space: add fsm space or vacuum more frequently. I don't want the system
to eat up a bunch of memory for fsm entries if my workload indicates that
I can easily vacuum more frequently.
There's no magic bullet here, but heuristics should be able to tell us you can
"easily vacuum more frequently" And again, I said these things would be
*optional*. Like an item in postgresql.conf
"i_have_read_the_manual_and_know_what_this_all_means = false #default false".
If you change it to true, you have all the control you're used to and nothing
will get in your way.
How about work_mem? shared_buffers? column statistics sizes? random_page_cost?
The only one that seems practical (to me) is random_page_cost. The others are
all configuration options that I (as a DBA) want to be able to decide for
myself. For example, I have some dedicated PG servers that I pretty much
max those values out at, to let PG know that it can use everything on the
system -- but I also have some shared use machines with PG, where I carefully
constrain those values so that PG doesn't muscle other daemons out of their
share of the RAM (work_mem is probably the best example)
Just because you carefully constrain it does not preclude the ability for
program logic to maintain statistics to do what I suggested.
It would be nice to have some kind of utility that could tell me what
random_page_cost should be, as I've never felt comfortable tweaking it.
Like some utility to run that would say "based on the seek tests I just
ran, you should set random_page_cost to x". Of course, if such a thing
existed, it could just fill in the value for you. But I haven't figured
out how to pick a good value for that setting, so I have no idea how to
suggest to have it automatically set.
Me either, but I thought if there's a reason it's user-settable, there must be
some demonstrable method for deciding what is best.
Couldn't some fairly simple regression tests akin to a VACUUM process spot
potential problems? "Hey, it looks like you need more fsm_relations.. I bumped
that up automatically for you". Or "These indexes look bloated, shall I
automatically reindex them for you?"A lot of that stuff does happen. A vacuum verbose will tell you what it
thinks you should do, but I don't _want_ it to do it automatically. What
if I create huge temporary tables once a week for some sort of analysis that
overload the fsm space? And if I'm dropping those tables when the analysis
is done, do I want the fsm space constantly adjusting?
I understand *you* don't want it done automatically. But my suspicion is that
there are a lot more newbie pg admins who would rather let the system do
something sensible as a default. Again, you sound defensive that somehow my
ideas would take power away from you. I'm not sure why that is, but certainly
I'm not suggesting that. An auto-pilot mode is not a bad idea just because a
few pilots don't want to use it.
Plus, some is just impossible. shared_buffers requires a restart. Do you
want your DB server spontaneously restarting because it thought more
buffers might be nice?
Well, maybe look at the bigger picture and see if it can be fixed to *not*
require a program restart? Or.. take effect on the next pid that gets created?
This is a current limitation, but doesn't need to be one for eternity does it?
I'm sure there are many more examples, that with some creative thinking, could
be auto-adjusted to match the usage patterns of the database. PG does an
excellent job of exposing the variables to the users, but mostly avoids telling
the user what to do or doing it for them. Instead, it is up to the user to know
where to look, what to look for, and how to react to things to improve
performance. This is not all bad, but it is assuming that all users are hackers
( which used to be true ), but certainly doesn't help when the average SQLServer
admin tries out Postgres and then is surprised at the things they are now
responsible for managing. PG is certainly *not* the only database to suffer
from this syndrome, I know..I expect the suffering is a result of the fact that databases are non-trivial
pieces of software, and there's no universally simple way to set them up
and make them run well.
Speaking as a former SQL Server admin ( from day 1 of the Sybase fork up to
version 2000 ), I can say there *is* a way to make them simple. It's certainly
not a perfect piece of software, but the learning curve speaks for itself. It
can auto-shrink your databases ( without locking problems ). Actually it pretty
much runs itself. It auto-allocates RAM for you ( up to the ceiling *you*
control ). It automatically re-analyzes itself.. I was able to successfully
manage several servers with not insignificant amounts of data in them for many
years without being a trained DBA. After switching to PG, I found myself having
to twiddle with all sorts of settings that seemed like it should just know about
without me having to tell it.
I'm not saying it was simple to make it do that. MS has invested LOTS of money
and effort into making it that way. I don't expect PG to have features like
that tomorrow or even next release. But, I feel it's important to make sure
that those who *can* realistically take steps in that direction understand this
point of view ( and with Josh's other reply to this, I think many do ).
I like to think of my systems as good employees. I don't want to have to
micromanage everything they do. I want to tell them "here's what I want done",
and assuming I made a good hiring choice, they will do it and take some liberty
to adjust parameters where needed to achieve the spirit of the goal, rather than
blindly do something inefficiently because I failed to explain to them the
absolute most efficient way to accomplish the task.That's silly. No software does that. You're asking software to behave like
humans. If that were the case, this would be Isaac Asimov's world, not the
real one.
It's not silly. There are plenty of systems that do that. Maybe you just
haven't used them. Again, SQL Server did a lot of those things for me. I
didn't have to fiddle with checkboxes or multi-select tuning options. It
learned what its load was and reacted appropriately. I never had to stare at
planner outputs and try and figure out why the heck did it choose that plan.
Although, I certainly could have if I wanted to. It has a tool called the SQL
Profiler which will "watch" your workload on the database, do regression testing
and suggest ( and optionally implement with a single click ) indexes on your
tables. I've been wanting to do this for years with PG, and had a small start
on a project to do just that actually.
Granted, there are some people who don't like the developers making any
assumptions about their workload. But this doesn't have to be an either/or
proposition. I don't think any control needs to be abandoned. But
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
the patch" ). I just don't know if this feeling has resonated well between new
users and long-term developers. I know it must be grating to have to answer the
same questions over and over and over "have you analyzed? Did you leave
postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO.Well, it seems like this is happening where it's practical -- autovacuum is
a good example.
Agreed, this is a huge step forward. And again, I'm not taking an offensive
posture on this. Just that I think it's worth giving my .02 since I have had
strong feelings about this for awhile.
Personally, I wouldn't be opposed to more automagic stuff, just as long as
I have the option to disable it. There are some cases where I still
disable autovac.In closing, I am not bashing PG! I love it and swear by it. These comments are
purely from an advocacy perspective. I'd love to see PG user base continue to grow.I expect that part of the problem is "who's going to do it?"
Yes, this is the classic problem. I'm not demanding anyone pick up the ball and
jump on this today, tomorrow, etc.. I just think it would be good for those who
*could* make a difference to keep those goals in mind when they continue. If
you have the right mindset, this problem will fix itself over time.
-Dan
Dan,
Yes, this is the classic problem. I'm not demanding anyone pick up the
ball and jump on this today, tomorrow, etc.. I just think it would be
good for those who *could* make a difference to keep those goals in mind
when they continue. If you have the right mindset, this problem will
fix itself over time.
Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took
Microsoft 3-4 years, and theirs still has major issues last I checked. And
both of those DBs support less OSes than we do. I think it's going to
take more than the *right mindset* and my spare time.
I appreciate your efforts in this regard. Do you have a formal project
plan for this? If you can share it with me, I'll take a look and see if
there is anything I can do to help out.
Nope, just some noodling around on the configurator:
www.pgfoundry.org/projects/configurator
I am on the verge of starting a Java UI that will query a bunch of the
pg_* tables and give the user information about wasted table space,
index usage, table scans, slow-running queries and spoon-feed it in a
nice attractive interface that can be a real-time system monitor tool.
This could be a cooperative project or might have some redundancy with
what you're up to.
I'd be *very* interested in collaborating with you on this. Further, we
could feed DTrace (& systemtap?) into the interface to get data that
PostgreSQL doesn't currently produce.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
On Fri, 27 Apr 2007, Josh Berkus wrote:
Dan,
Yes, this is the classic problem. �I'm not demanding anyone pick up the
ball and jump on this today, tomorrow, etc.. I just think it would be
good for those who *could* make a difference to keep those goals in mind
when they continue. �If you have the right mindset, this problem will
fix itself over time.Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took
Microsoft 3-4 years, and theirs still has major issues last I checked. And
both of those DBs support less OSes than we do. I think it's going to
take more than the *right mindset* and my spare time.
I think there are a couple different things here.
1. full autotuning
as you say, this is very hard and needs a lot of info about your
particular database useage.
2. getting defaults that are closer to right then current.
this is much easier. for this nobody is expecting that the values are
right, we're just begging for some tool to get us within an couple orders
of magnatude of what's correct.
the current defaults are appropriate for a single cpu with 10's of MB of
ram and a single drive
nowdays you have people trying to run quick-and-dirty tests on some spare
hardware they have laying around (waiting for another project) that's got
4-8 CPU's with 10's of GB of ram and a couple dozen drives
these people don't know about database tuneing, they can learn, but they
want to see if postgres is even in the ballpark. if the results are close
to acceptable they will ask questions and research the tuneing, but if the
results are orders of magnatude lower then they need to be they'll just
say that postgress is too slow and try another database.
an autodefault script that was written assuming that postgres has the box
to itself would be a wonderful start.
I think the next step would be to be able to tell the script 'only plan on
useing 1/2 of this box'
and beyond that would be the steps that you are thinking of where the
useage pattern is considered.
but when every performance question is answered with "did you change the
defaults? they are way too low for modern hardware, raise them by 2 orders
of magnatude and then we'll start investigating"
David Lang
From pgsql-performance-owner@postgresql.org Sat Apr 28 05:31:42 2007
Received: from localhost (maia-1.hub.org [200.46.204.191])
by postgresql.org (Postfix) with ESMTP id 3F1869FB607
for <pgsql-performance-postgresql.org@postgresql.org>; Sat, 28 Apr 2007 05:31:41 -0300 (ADT)
Received: from postgresql.org ([200.46.204.71])
by localhost (mx1.hub.org [200.46.204.191]) (amavisd-maia, port 10024)
with ESMTP id 15723-06 for <pgsql-performance-postgresql.org@postgresql.org>;
Sat, 28 Apr 2007 05:31:38 -0300 (ADT)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.4
Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.233])
by postgresql.org (Postfix) with ESMTP id 04BFF9FB59A
for <pgsql-performance@postgresql.org>; Sat, 28 Apr 2007 05:31:37 -0300 (ADT)
Received: by nz-out-0506.google.com with SMTP id s1so704120nze
for <pgsql-performance@postgresql.org>; Sat, 28 Apr 2007 01:31:36 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed;
d=gmail.com; s=beta;
h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
b=BM60fgQwXPcs1GUfLpPBiCFj5cZ3ZqLypaxyNCB6+eG3++YbjB9vQMB0CByg7H9VZbROjDCBQaKYNcPXqTOcIOahC8BGUcaLwzGbnNivSrRw/fSOHvbj27DV+HyFqO4McuCcyAVEiMu53x6NLrW+WYN38IUsBA6Qq09UYTnewnI=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=beta;
h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
b=JEKwm6hM5kTSCrvKvJ2/XNkpHjnI6OimQ6r116Qq03eWF9W//Zk772yoJWl5eAGMOT/ylxOtxiAJPWMyaI2CauOodqtEW/0PqFAzOsG0kS4q1Jbd3xEScSJephvmtWzshiAykiVOg6HoEpzH51kLE5p99hU9jcdQh0FtIAWcqpA=
Received: by 10.115.74.1 with SMTP id b1mr1299635wal.1177749095891;
Sat, 28 Apr 2007 01:31:35 -0700 (PDT)
Received: by 10.114.12.15 with HTTP; Sat, 28 Apr 2007 01:31:35 -0700 (PDT)
Message-ID: <7be3f35d0704280131o6e6d2044n78c207e77b55f6a1@mail.gmail.com>
Date: Sat, 28 Apr 2007 10:31:35 +0200
From: "Harald Armin Massa" <haraldarminmassa@gmail.com>
To: "Carlos Moreno" <moreno_pg@mochima.com>
Subject: Re: Feature Request --- was: PostgreSQL Performance Tuning
Cc: "PostgreSQL Performance" <pgsql-performance@postgresql.org>
In-Reply-To: <4631FA55.4010406@mochima.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_Part_239023_22416331.1177749095813"
References: <OFC96167D5.6BB63AC2-ON652572C9.003B5263-652572C9.003E4142@lntinfotech.com>
<46310247.2050802@pinpointresearch.com> <463158DD.3060602@mochima.com>
<2679.1177642153@sss.pgh.pa.us> <4631FA55.4010406@mochima.com>
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Archive-Number: 200704/490
X-Sequence-Number: 24367
------=_Part_239023_22416331.1177749095813
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Carlos,
about your feature proposal: as I learned, nearly all
Perfomance.Configuration can be done by editing the .INI file and making th=
e
Postmaster re-read it.
So, WHY at all should those parameters be guessed at the installation of th=
e
database? Would'nt it be a saver point of time to have some postgresql-tun=
e
utilitiy, which gets run after the installation, maybe every once in a
while. That tool can check vital information like "Databasesize to memory
relation"; and suggest a new postgresql.ini.
That tool needs NO INTEGRATION whatsoever - it can be developed, deployed
totally independend and later only be bundled.
Does my suggestion make more sense now? Or is it still too unrealistic to
make it work properly/safely?
And as this tool can be tested seperately, does not need a new initdb every
time ... it can be developed more easily.
Maybe there is even a pointy flashy version possible (perhaps even for mone=
y
:) which gives nice graphics and "optimized", like those Windows Optimizers=
.
:) I am sure, some DBAs in BIGCOMPs would be thrilled :)
May that be a possible way?
Harald
--=20
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
------=_Part_239023_22416331.1177749095813
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Carlos,<br><br>about your feature proposal: as I learned, nearly all Perfom=
ance.Configuration can be done by editing the .INI file and making the Post=
master re-read it.<br><br>So, WHY at all should those parameters be guessed=
at the installation of the database? Would'nt it be a saver point of t=
ime to have some postgresql-tune=20
<br>utilitiy, which gets run after the installation, maybe every once in a =
while. That tool can check vital information like "Databasesize to mem=
ory relation"; and suggest a new postgresql.ini.<br><br>That tool need=
s NO INTEGRATION whatsoever - it can be developed, deployed totally indepen=
dend and later only be bundled.=20
<br><br><div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px so=
lid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Does=
my suggestion make more sense now? Or is it still too unrealist=
ic to<br>make it work properly/safely?
</blockquote><div><br></div></div>And as this tool can be tested seperately=
, does not need a new initdb every time ... it can be developed more easily=
. <br><br>Maybe there is even a pointy flashy version possible (perhaps eve=
n for money :) which gives nice graphics and "optimized", like th=
ose Windows Optimizers. :) I am sure, some DBAs in BIGCOMPs would be =
thrilled :)
<br><br>May that be a possible way?<br><br>Harald<br><br><br>-- <br>GHUM Ha=
rald Massa<br>persuadere et programmare<br>Harald Armin Massa<br>Reinsburgs=
tra=DFe 202b<br>70197 Stuttgart<br>0173/9409607<br>fx 01212-5-13695179 <br>
-<br>Python: the only language with more web frameworks than keywords.
------=_Part_239023_22416331.1177749095813--