PostgreSQL Performance Tuning

moreno_pg@mochima.com

almost 19 years ago

In reply to: Steve Crawford (#2)

Feature Request --- was: PostgreSQL Performance Tuning

Steve Crawford wrote:

Have you changed _anything_ from the defaults? The defaults are set so
PG will run on as many installations as practical. They are not set for
performance - that is specific to your equipment, your data, and how you
need to handle the data.

Is this really the sensible thing to do? I know we should not encourage
the world we're leaving in even more in the ways of "have the computer
do everything for us so that we don't need to have even a clue about what
we're doing" ... But, wouldn't it make sense that the configure script
determines the amount of physical memory and perhaps even do a HD
speed estimate to set up defaults that are closer to a
performance-optimized
configuration?

Then, perhaps command switches so that you could specify the type of
access you estimate for your system. Perhaps something like:

./configure --db-size=100GB --write-percentage=20 .... etc.

(switch write-percentage above indicates that we estimate that 20% of
the DB activity would be writing to the disk --- there may be other
switches to indicate the percentage of queries that are transactions,
the percentage of queries that are complex; percentage that require
index usage, etc. etc. etc.)... And then, based on that, a better set of
defaults could be set by the configuration script.

Does this make sense? Or perhaps I'm watching too much science
fiction?

Carlos
--

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Carlos Moreno (#3)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Carlos Moreno <moreno_pg@mochima.com> writes:

... But, wouldn't it make sense that the configure script
determines the amount of physical memory and perhaps even do a HD
speed estimate to set up defaults that are closer to a
performance-optimized
configuration?

No. Most copies of Postgres these days are executed on machines very
far away from where the code was built. It's a little bit safer to
try to tune things at initdb time ... as indeed we already do. But
the fundamental problem remains that we don't know that much about
how the installation will be used. For example, the planner
configuration parameters turn out to have not that much to do with the
absolute speed of your drive, and a whole lot to do with the ratio
of the size of your database to the amount of RAM you've got; and the
ultimate size of the DB is one thing initdb certainly can't guess.

Also, there is an extremely good reason why Postgres will never be set
up to try to take over the whole machine by default: most of the
developers run multiple postmasters on their machines.

regards, tom lane

moreno_pg@mochima.com

almost 19 years ago

In reply to: Tom Lane (#4)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Tom Lane wrote:

Carlos Moreno <moreno_pg@mochima.com> writes:

... But, wouldn't it make sense that the configure script
determines the amount of physical memory and perhaps even do a HD
speed estimate to set up defaults that are closer to a
performance-optimized
configuration?

No. Most copies of Postgres these days are executed on machines very
far away from where the code was built. It's a little bit safer to
try to tune things at initdb time ... as indeed we already do.

D'oh! Yes, that makes more sense, of course.

But
the fundamental problem remains that we don't know that much about
how the installation will be used.

Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on how
the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will be
transactions,
percentage that will use indexes, percentage of queries that will be
complex,
etc. etc.

Wouldn't initdb be able to do a better job at coming up with sensible
defaults if it counts on this information? Of course, all these
parameters
would have their own defaults --- the user won't necessarily know or have
an accurate estimate for each and every one of them.

Also, there is an extremely good reason why Postgres will never be set
up to try to take over the whole machine by default: most of the
developers run multiple postmasters on their machines.

Wouldn't this be covered by the above suggestion?? One of the switches
for the command initdb could allow the user to specify how many instances
will be run (I assume you're talking about having different instances
listening
on different ports for increased concurrency-related benefits?)

Does my suggestion make more sense now? Or is it still too unrealistic to
make it work properly/safely?

Carlos
--

Michael Stone

mstone+postgres@mathom.us

almost 19 years ago

In reply to: Carlos Moreno (#5)

Re: Feature Request --- was: PostgreSQL Performance Tuning

On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:

Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on how
the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will be
transactions,
percentage that will use indexes, percentage of queries that will be
complex,
etc. etc.

If the person knows all that, why wouldn't they know to just change the
config parameters?

Mike Stone

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Carlos Moreno (#5)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Carlos Moreno <moreno_pg@mochima.com> writes:

Tom Lane wrote:

But
the fundamental problem remains that we don't know that much about
how the installation will be used.

Notice that the second part of my suggestion covers this --- have
additional switches to initdb

That's been proposed and rejected before, too; the main problem being
that initdb is frequently a layer or two down from the user (eg,
executed by initscripts that can't pass extra arguments through, even
assuming they're being invoked by hand in the first place).

regards, tom lane

Mark Lewis

mark.lewis@mir3.com

almost 19 years ago

In reply to: Michael Stone (#6)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Maybe he's looking for a switch for initdb that would make it
interactive and quiz you about your expected usage-- sort of a magic
auto-configurator wizard doohicky? I could see that sort of thing being
nice for the casual user or newbie who otherwise would have a horribly
mis-tuned database. They could instead have only a marginally mis-tuned
database :)

Show quoted text

On Fri, 2007-04-27 at 10:30 -0400, Michael Stone wrote:

On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:

Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on how
the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will be
transactions,
percentage that will use indexes, percentage of queries that will be
complex,
etc. etc.

If the person knows all that, why wouldn't they know to just change the
config parameters?

Mike Stone

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

H.J. Sanders

hjs@rmax.nl

almost 19 years ago

In reply to: Tom Lane (#7)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Hello.

Just my 2 cents, and not looking to the technical aspects:

setting up PSQL is the weakest point of PSQL as we have experienced ourself,
once it is running it is great.

I can imagine that a lot of people of stops after their first trials after
they have
experienced the troubles and performance of a standard set up.

This is ofcourse a lost user forever.

So anything that could be done to get an easier and BETTER setup would
strongly enhance PSQL.

My 2 cents.

Henk Sanders

-----Oorspronkelijk bericht-----
Van: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]Namens Tom Lane
Verzonden: vrijdag 27 april 2007 16:37
Aan: Carlos Moreno
CC: PostgreSQL Performance
Onderwerp: Re: [PERFORM] Feature Request --- was: PostgreSQL Performance
Tuning

Carlos Moreno <moreno_pg@mochima.com> writes:

Tom Lane wrote:

But
the fundamental problem remains that we don't know that much about
how the installation will be used.

Notice that the second part of my suggestion covers this --- have
additional switches to initdb

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#10

Michael Stone

mstone+postgres@mathom.us

almost 19 years ago

In reply to: Mark Lewis (#8)

Re: Feature Request --- was: PostgreSQL Performance Tuning

On Fri, Apr 27, 2007 at 07:36:52AM -0700, Mark Lewis wrote:

Maybe he's looking for a switch for initdb that would make it
interactive and quiz you about your expected usage-- sort of a magic
auto-configurator wizard doohicky? I could see that sort of thing being
nice for the casual user or newbie who otherwise would have a horribly
mis-tuned database. They could instead have only a marginally mis-tuned
database :)

However you implement it, anyone who can answer all of those questions
is probably capable of reading and understanding the performance section
in the manual.

It's probably more practical to have a seperate script that looks at the
running system (ram, disks, pg config, db size, indices, stats, etc.)
and makes suggestions--if someone wants to write such a thing.

Mike Stone

#11

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 19 years ago

In reply to: Michael Stone (#6)

Re: Feature Request --- was: PostgreSQL Performance Tuning

On Apr 27, 2007, at 3:30 PM, Michael Stone wrote:

On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:

Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on
how the DB
will be used: estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will
be transactions,
percentage that will use indexes, percentage of queries that will
be complex,
etc. etc.

If the person knows all that, why wouldn't they know to just change
the config parameters?

Because knowing your expected workload is a lot easier for many
people than knowing what every GUC does.

Personally, I think it would be a tremendous start if we just
provided a few sample configs like MySQL does. Or if someone wanted
to get fancy they could stick a web page somewhere that would produce
a postgresql.conf based simply on how much available RAM you had,
since that's one of the biggest performance-hampering issues we run
into (ie: shared_buffers left at the default of 32MB).
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#12

Dan Harris

fbsd@drivefaster.net

almost 19 years ago

In reply to: Michael Stone (#6)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Michael Stone wrote:

On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:

Notice that the second part of my suggestion covers this --- have
additional
switches to initdb

<snip>

If the person knows all that, why wouldn't they know to just change the
config parameters?

Exactly.. What I think would be much more productive is to use the great amount
of information that PG tracks internally and auto-tune the parameters based on
it. For instance:

Why does the user need to manually track max_fsm_pages and max_fsm_relations? I
bet there are many users who have never taken the time to understand what this
means and wondering why performance still stinks after vacuuming their database
( spoken from my own experience )

How about work_mem? shared_buffers? column statistics sizes? random_page_cost?

Couldn't some fairly simple regression tests akin to a VACUUM process spot
potential problems? "Hey, it looks like you need more fsm_relations.. I bumped
that up automatically for you". Or "These indexes look bloated, shall I
automatically reindex them for you?"

I'm sure there are many more examples, that with some creative thinking, could
be auto-adjusted to match the usage patterns of the database. PG does an
excellent job of exposing the variables to the users, but mostly avoids telling
the user what to do or doing it for them. Instead, it is up to the user to know
where to look, what to look for, and how to react to things to improve
performance. This is not all bad, but it is assuming that all users are hackers
( which used to be true ), but certainly doesn't help when the average SQLServer
admin tries out Postgres and then is surprised at the things they are now
responsible for managing. PG is certainly *not* the only database to suffer
from this syndrome, I know..

I like to think of my systems as good employees. I don't want to have to
micromanage everything they do. I want to tell them "here's what I want done",
and assuming I made a good hiring choice, they will do it and take some liberty
to adjust parameters where needed to achieve the spirit of the goal, rather than
blindly do something inefficiently because I failed to explain to them the
absolute most efficient way to accomplish the task.

Granted, there are some people who don't like the developers making any
assumptions about their workload. But this doesn't have to be an either/or
proposition. I don't think any control needs to be abandoned. But
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
the patch" ). I just don't know if this feeling has resonated well between new
users and long-term developers. I know it must be grating to have to answer the
same questions over and over and over "have you analyzed? Did you leave
postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO.

In closing, I am not bashing PG! I love it and swear by it. These comments are
purely from an advocacy perspective. I'd love to see PG user base continue to grow.

My .02

-Dan

#13

josh@agliodbs.com

almost 19 years ago

In reply to: Dan Harris (#12)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Dan,

Exactly.. What I think would be much more productive is to use the
great amount of information that PG tracks internally and auto-tune the
parameters based on it. For instance:

*Everyone* wants this. The problem is that it's very hard code to write
given the number of variables. I'm working on it but progress is slow,
due to my travel schedule.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

#14

Bill Moran

wmoran@collaborativefusion.com

almost 19 years ago

In reply to: Dan Harris (#12)

Re: Feature Request --- was: PostgreSQL Performance Tuning

In response to Dan Harris <fbsd@drivefaster.net>:

Michael Stone wrote:

On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:

Notice that the second part of my suggestion covers this --- have
additional
switches to initdb

<snip>

If the person knows all that, why wouldn't they know to just change the
config parameters?

Exactly.. What I think would be much more productive is to use the great amount
of information that PG tracks internally and auto-tune the parameters based on
it. For instance:

Why does the user need to manually track max_fsm_pages and max_fsm_relations? I
bet there are many users who have never taken the time to understand what this
means and wondering why performance still stinks after vacuuming their database
( spoken from my own experience )

But there are two distinct routes that can be taken if there's not enough
fsm space: add fsm space or vacuum more frequently. I don't want the system
to eat up a bunch of memory for fsm entries if my workload indicates that
I can easily vacuum more frequently.

How about work_mem? shared_buffers? column statistics sizes? random_page_cost?

The only one that seems practical (to me) is random_page_cost. The others are
all configuration options that I (as a DBA) want to be able to decide for
myself. For example, I have some dedicated PG servers that I pretty much
max those values out at, to let PG know that it can use everything on the
system -- but I also have some shared use machines with PG, where I carefully
constrain those values so that PG doesn't muscle other daemons out of their
share of the RAM (work_mem is probably the best example)

It would be nice to have some kind of utility that could tell me what
random_page_cost should be, as I've never felt comfortable tweaking it.
Like some utility to run that would say "based on the seek tests I just
ran, you should set random_page_cost to x". Of course, if such a thing
existed, it could just fill in the value for you. But I haven't figured
out how to pick a good value for that setting, so I have no idea how to
suggest to have it automatically set.

Couldn't some fairly simple regression tests akin to a VACUUM process spot
potential problems? "Hey, it looks like you need more fsm_relations.. I bumped
that up automatically for you". Or "These indexes look bloated, shall I
automatically reindex them for you?"

A lot of that stuff does happen. A vacuum verbose will tell you what it
thinks you should do, but I don't _want_ it to do it automatically. What
if I create huge temporary tables once a week for some sort of analysis that
overload the fsm space? And if I'm dropping those tables when the analysis
is done, do I want the fsm space constantly adjusting?

Plus, some is just impossible. shared_buffers requires a restart. Do you
want your DB server spontaneously restarting because it thought more
buffers might be nice?

I'm sure there are many more examples, that with some creative thinking, could
be auto-adjusted to match the usage patterns of the database. PG does an
excellent job of exposing the variables to the users, but mostly avoids telling
the user what to do or doing it for them. Instead, it is up to the user to know
where to look, what to look for, and how to react to things to improve
performance. This is not all bad, but it is assuming that all users are hackers
( which used to be true ), but certainly doesn't help when the average SQLServer
admin tries out Postgres and then is surprised at the things they are now
responsible for managing. PG is certainly *not* the only database to suffer
from this syndrome, I know..

I expect the suffering is a result of the fact that databases are non-trivial
pieces of software, and there's no universally simple way to set them up
and make them run well.

I like to think of my systems as good employees. I don't want to have to
micromanage everything they do. I want to tell them "here's what I want done",
and assuming I made a good hiring choice, they will do it and take some liberty
to adjust parameters where needed to achieve the spirit of the goal, rather than
blindly do something inefficiently because I failed to explain to them the
absolute most efficient way to accomplish the task.

That's silly. No software does that. You're asking software to behave like
humans. If that were the case, this would be Isaac Asimov's world, not the
real one.

Granted, there are some people who don't like the developers making any
assumptions about their workload. But this doesn't have to be an either/or
proposition. I don't think any control needs to be abandoned. But
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
the patch" ). I just don't know if this feeling has resonated well between new
users and long-term developers. I know it must be grating to have to answer the
same questions over and over and over "have you analyzed? Did you leave
postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO.

Well, it seems like this is happening where it's practical -- autovacuum is
a good example.

Personally, I wouldn't be opposed to more automagic stuff, just as long as
I have the option to disable it. There are some cases where I still
disable autovac.

In closing, I am not bashing PG! I love it and swear by it. These comments are
purely from an advocacy perspective. I'd love to see PG user base continue to grow.

I expect that part of the problem is "who's going to do it?"

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

#15

Kevin Hunter

hunteke@earlham.edu

almost 19 years ago

In reply to: Tom Lane (#7)

Re: Feature Request --- was: PostgreSQL Performance Tuning

At 10:36a -0400 on 27 Apr 2007, Tom Lane wrote:

That's been proposed and rejected before, too; the main problem being
that initdb is frequently a layer or two down from the user (eg,
executed by initscripts that can't pass extra arguments through, even
assuming they're being invoked by hand in the first place).

And following after Dan Harris' response . . .

So what's the problem with having some sort of cronjob contrib module
that utilizes the actual and current statistics to make
recommendations? I don't think it'd be right to simply change the
configuration options as it sees fit (especially as it was pointed
out that many run multiple postmasters or have other uses for the
machines in question), but perhaps it could send a message (email?)
along the lines of "Hey, I'm currently doing this many of X
transactions, against this much of Y data, and working under these
constraints. You might get better performance (in this area ... ) if
you altered the the configurations options like so: ..."

Certainly not for the masters, but perhaps for standard installation
sort of deals, sort of liking bringing up the rear . . . just a thought.

Kevin

#16

Ray Stell

stellr@cns.vt.edu

almost 19 years ago

In reply to: Kevin Hunter (#15)

Re: Feature Request --- was: PostgreSQL Performance Tuning

On Fri, Apr 27, 2007 at 02:40:07PM -0400, Kevin Hunter wrote:

out that many run multiple postmasters or have other uses for the
machines in question), but perhaps it could send a message (email?)
along the lines of "Hey, I'm currently doing this many of X
transactions, against this much of Y data, and working under these
constraints. You might get better performance (in this area ... ) if
you altered the the configurations options like so: ..."

or storing the values in the db for later trending analysis, witness
ora statspack.

#17

josh@agliodbs.com

almost 19 years ago

In reply to: Bill Moran (#14)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Bill,

The only one that seems practical (to me) is random_page_cost. The
others are all configuration options that I (as a DBA) want to be able
to decide for myself.

Actually, random_page_cost *should* be a constant "4.0" or "3.5", which
represents the approximate ratio of seek/scan speed which has been
relatively constant across 6 years of HDD technology. The only reason we
make it a configuration variable is that there's defects in our cost model
which cause users to want to tinker with it.

Mind you, that's gotten better in recent versions as well. Lately I mostly
tinker with effective_cache_size and the various cpu_* stats rather than
modifying random_page_cost.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

#18

Dan Harris

fbsd@drivefaster.net

almost 19 years ago

In reply to: Bill Moran (#14)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Bill Moran wrote:

In response to Dan Harris <fbsd@drivefaster.net>:

<snip>

Why does the user need to manually track max_fsm_pages and max_fsm_relations? I
bet there are many users who have never taken the time to understand what this
means and wondering why performance still stinks after vacuuming their database
( spoken from my own experience )

But there are two distinct routes that can be taken if there's not enough
fsm space: add fsm space or vacuum more frequently. I don't want the system
to eat up a bunch of memory for fsm entries if my workload indicates that
I can easily vacuum more frequently.

There's no magic bullet here, but heuristics should be able to tell us you can
"easily vacuum more frequently" And again, I said these things would be
*optional*. Like an item in postgresql.conf
"i_have_read_the_manual_and_know_what_this_all_means = false #default false".
If you change it to true, you have all the control you're used to and nothing
will get in your way.

How about work_mem? shared_buffers? column statistics sizes? random_page_cost?

The only one that seems practical (to me) is random_page_cost. The others are
all configuration options that I (as a DBA) want to be able to decide for
myself. For example, I have some dedicated PG servers that I pretty much
max those values out at, to let PG know that it can use everything on the
system -- but I also have some shared use machines with PG, where I carefully
constrain those values so that PG doesn't muscle other daemons out of their
share of the RAM (work_mem is probably the best example)

Just because you carefully constrain it does not preclude the ability for
program logic to maintain statistics to do what I suggested.

It would be nice to have some kind of utility that could tell me what
random_page_cost should be, as I've never felt comfortable tweaking it.
Like some utility to run that would say "based on the seek tests I just
ran, you should set random_page_cost to x". Of course, if such a thing
existed, it could just fill in the value for you. But I haven't figured
out how to pick a good value for that setting, so I have no idea how to
suggest to have it automatically set.

Me either, but I thought if there's a reason it's user-settable, there must be
some demonstrable method for deciding what is best.

Couldn't some fairly simple regression tests akin to a VACUUM process spot
potential problems? "Hey, it looks like you need more fsm_relations.. I bumped
that up automatically for you". Or "These indexes look bloated, shall I
automatically reindex them for you?"

A lot of that stuff does happen. A vacuum verbose will tell you what it
thinks you should do, but I don't _want_ it to do it automatically. What
if I create huge temporary tables once a week for some sort of analysis that
overload the fsm space? And if I'm dropping those tables when the analysis
is done, do I want the fsm space constantly adjusting?

I understand *you* don't want it done automatically. But my suspicion is that
there are a lot more newbie pg admins who would rather let the system do
something sensible as a default. Again, you sound defensive that somehow my
ideas would take power away from you. I'm not sure why that is, but certainly
I'm not suggesting that. An auto-pilot mode is not a bad idea just because a
few pilots don't want to use it.

Plus, some is just impossible. shared_buffers requires a restart. Do you
want your DB server spontaneously restarting because it thought more
buffers might be nice?

Well, maybe look at the bigger picture and see if it can be fixed to *not*
require a program restart? Or.. take effect on the next pid that gets created?
This is a current limitation, but doesn't need to be one for eternity does it?

I'm sure there are many more examples, that with some creative thinking, could
be auto-adjusted to match the usage patterns of the database. PG does an
excellent job of exposing the variables to the users, but mostly avoids telling
the user what to do or doing it for them. Instead, it is up to the user to know
where to look, what to look for, and how to react to things to improve
performance. This is not all bad, but it is assuming that all users are hackers
( which used to be true ), but certainly doesn't help when the average SQLServer
admin tries out Postgres and then is surprised at the things they are now
responsible for managing. PG is certainly *not* the only database to suffer
from this syndrome, I know..

I expect the suffering is a result of the fact that databases are non-trivial
pieces of software, and there's no universally simple way to set them up
and make them run well.

Speaking as a former SQL Server admin ( from day 1 of the Sybase fork up to
version 2000 ), I can say there *is* a way to make them simple. It's certainly
not a perfect piece of software, but the learning curve speaks for itself. It
can auto-shrink your databases ( without locking problems ). Actually it pretty
much runs itself. It auto-allocates RAM for you ( up to the ceiling *you*
control ). It automatically re-analyzes itself.. I was able to successfully
manage several servers with not insignificant amounts of data in them for many
years without being a trained DBA. After switching to PG, I found myself having
to twiddle with all sorts of settings that seemed like it should just know about
without me having to tell it.

I'm not saying it was simple to make it do that. MS has invested LOTS of money
and effort into making it that way. I don't expect PG to have features like
that tomorrow or even next release. But, I feel it's important to make sure
that those who *can* realistically take steps in that direction understand this
point of view ( and with Josh's other reply to this, I think many do ).

I like to think of my systems as good employees. I don't want to have to
micromanage everything they do. I want to tell them "here's what I want done",
and assuming I made a good hiring choice, they will do it and take some liberty
to adjust parameters where needed to achieve the spirit of the goal, rather than
blindly do something inefficiently because I failed to explain to them the
absolute most efficient way to accomplish the task.

That's silly. No software does that. You're asking software to behave like
humans. If that were the case, this would be Isaac Asimov's world, not the
real one.

It's not silly. There are plenty of systems that do that. Maybe you just
haven't used them. Again, SQL Server did a lot of those things for me. I
didn't have to fiddle with checkboxes or multi-select tuning options. It
learned what its load was and reacted appropriately. I never had to stare at
planner outputs and try and figure out why the heck did it choose that plan.
Although, I certainly could have if I wanted to. It has a tool called the SQL
Profiler which will "watch" your workload on the database, do regression testing
and suggest ( and optionally implement with a single click ) indexes on your
tables. I've been wanting to do this for years with PG, and had a small start
on a project to do just that actually.

Granted, there are some people who don't like the developers making any
assumptions about their workload. But this doesn't have to be an either/or
proposition. I don't think any control needs to be abandoned. But
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
the patch" ). I just don't know if this feeling has resonated well between new
users and long-term developers. I know it must be grating to have to answer the
same questions over and over and over "have you analyzed? Did you leave
postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO.

Well, it seems like this is happening where it's practical -- autovacuum is
a good example.

Agreed, this is a huge step forward. And again, I'm not taking an offensive
posture on this. Just that I think it's worth giving my .02 since I have had
strong feelings about this for awhile.

Personally, I wouldn't be opposed to more automagic stuff, just as long as
I have the option to disable it. There are some cases where I still
disable autovac.

In closing, I am not bashing PG! I love it and swear by it. These comments are
purely from an advocacy perspective. I'd love to see PG user base continue to grow.

I expect that part of the problem is "who's going to do it?"

Yes, this is the classic problem. I'm not demanding anyone pick up the ball and
jump on this today, tomorrow, etc.. I just think it would be good for those who
*could* make a difference to keep those goals in mind when they continue. If
you have the right mindset, this problem will fix itself over time.

-Dan

#19

josh@agliodbs.com

almost 19 years ago

In reply to: Dan Harris (#18)

Re: Feature Request --- was: PostgreSQL Performance Tuning

Dan,

Yes, this is the classic problem. I'm not demanding anyone pick up the
ball and jump on this today, tomorrow, etc.. I just think it would be
good for those who *could* make a difference to keep those goals in mind
when they continue. If you have the right mindset, this problem will
fix itself over time.

Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took
Microsoft 3-4 years, and theirs still has major issues last I checked. And
both of those DBs support less OSes than we do. I think it's going to
take more than the *right mindset* and my spare time.

I appreciate your efforts in this regard. Do you have a formal project
plan for this? If you can share it with me, I'll take a look and see if
there is anything I can do to help out.

Nope, just some noodling around on the configurator:
www.pgfoundry.org/projects/configurator

I am on the verge of starting a Java UI that will query a bunch of the
pg_* tables and give the user information about wasted table space,
index usage, table scans, slow-running queries and spoon-feed it in a
nice attractive interface that can be a real-time system monitor tool.
This could be a cooperative project or might have some redundancy with
what you're up to.

I'd be *very* interested in collaborating with you on this. Further, we
could feed DTrace (& systemtap?) into the interface to get data that
PostgreSQL doesn't currently produce.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

#20

almost 19 years ago

In reply to: Josh Berkus (#19)

Re: Feature Request --- was: PostgreSQL Performance Tuning

On Fri, 27 Apr 2007, Josh Berkus wrote:

Dan,

Yes, this is the classic problem. ï¿½I'm not demanding anyone pick up the
ball and jump on this today, tomorrow, etc.. I just think it would be
good for those who *could* make a difference to keep those goals in mind
when they continue. ï¿½If you have the right mindset, this problem will
fix itself over time.

Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took
Microsoft 3-4 years, and theirs still has major issues last I checked. And
both of those DBs support less OSes than we do. I think it's going to
take more than the *right mindset* and my spare time.

I think there are a couple different things here.

1. full autotuning

as you say, this is very hard and needs a lot of info about your
particular database useage.

2. getting defaults that are closer to right then current.

this is much easier. for this nobody is expecting that the values are
right, we're just begging for some tool to get us within an couple orders
of magnatude of what's correct.

the current defaults are appropriate for a single cpu with 10's of MB of
ram and a single drive

nowdays you have people trying to run quick-and-dirty tests on some spare
hardware they have laying around (waiting for another project) that's got
4-8 CPU's with 10's of GB of ram and a couple dozen drives

these people don't know about database tuneing, they can learn, but they
want to see if postgres is even in the ballpark. if the results are close
to acceptable they will ask questions and research the tuneing, but if the
results are orders of magnatude lower then they need to be they'll just
say that postgress is too slow and try another database.

an autodefault script that was written assuming that postgres has the box
to itself would be a wonderful start.

I think the next step would be to be able to tell the script 'only plan on
useing 1/2 of this box'

and beyond that would be the steps that you are thinking of where the
useage pattern is considered.

but when every performance question is answered with "did you change the
defaults? they are way too low for modern hardware, raise them by 2 orders
of magnatude and then we'll start investigating"

David Lang

From pgsql-performance-owner@postgresql.org Sat Apr 28 05:31:42 2007

Received: from localhost (maia-1.hub.org [200.46.204.191])
by postgresql.org (Postfix) with ESMTP id 3F1869FB607
for <pgsql-performance-postgresql.org@postgresql.org>; Sat, 28 Apr 2007 05:31:41 -0300 (ADT)
Received: from postgresql.org ([200.46.204.71])
by localhost (mx1.hub.org [200.46.204.191]) (amavisd-maia, port 10024)
with ESMTP id 15723-06 for <pgsql-performance-postgresql.org@postgresql.org>;
Sat, 28 Apr 2007 05:31:38 -0300 (ADT)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.4
Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.233])
by postgresql.org (Postfix) with ESMTP id 04BFF9FB59A
for <pgsql-performance@postgresql.org>; Sat, 28 Apr 2007 05:31:37 -0300 (ADT)
Received: by nz-out-0506.google.com with SMTP id s1so704120nze
for <pgsql-performance@postgresql.org>; Sat, 28 Apr 2007 01:31:36 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed;
d=gmail.com; s=beta;
h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
b=BM60fgQwXPcs1GUfLpPBiCFj5cZ3ZqLypaxyNCB6+eG3++YbjB9vQMB0CByg7H9VZbROjDCBQaKYNcPXqTOcIOahC8BGUcaLwzGbnNivSrRw/fSOHvbj27DV+HyFqO4McuCcyAVEiMu53x6NLrW+WYN38IUsBA6Qq09UYTnewnI=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=beta;
h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
b=JEKwm6hM5kTSCrvKvJ2/XNkpHjnI6OimQ6r116Qq03eWF9W//Zk772yoJWl5eAGMOT/ylxOtxiAJPWMyaI2CauOodqtEW/0PqFAzOsG0kS4q1Jbd3xEScSJephvmtWzshiAykiVOg6HoEpzH51kLE5p99hU9jcdQh0FtIAWcqpA=
Received: by 10.115.74.1 with SMTP id b1mr1299635wal.1177749095891;
Sat, 28 Apr 2007 01:31:35 -0700 (PDT)
Received: by 10.114.12.15 with HTTP; Sat, 28 Apr 2007 01:31:35 -0700 (PDT)
Message-ID: <7be3f35d0704280131o6e6d2044n78c207e77b55f6a1@mail.gmail.com>
Date: Sat, 28 Apr 2007 10:31:35 +0200
From: "Harald Armin Massa" <haraldarminmassa@gmail.com>
To: "Carlos Moreno" <moreno_pg@mochima.com>
Subject: Re: Feature Request --- was: PostgreSQL Performance Tuning
Cc: "PostgreSQL Performance" <pgsql-performance@postgresql.org>
In-Reply-To: <4631FA55.4010406@mochima.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_Part_239023_22416331.1177749095813"
References: <OFC96167D5.6BB63AC2-ON652572C9.003B5263-652572C9.003E4142@lntinfotech.com>
<46310247.2050802@pinpointresearch.com> <463158DD.3060602@mochima.com>
<2679.1177642153@sss.pgh.pa.us> <4631FA55.4010406@mochima.com>
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Archive-Number: 200704/490
X-Sequence-Number: 24367

------=_Part_239023_22416331.1177749095813
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Carlos,

about your feature proposal: as I learned, nearly all
Perfomance.Configuration can be done by editing the .INI file and making th=
e
Postmaster re-read it.

So, WHY at all should those parameters be guessed at the installation of th=
e
database? Would'nt it be a saver point of time to have some postgresql-tun=
e

utilitiy, which gets run after the installation, maybe every once in a
while. That tool can check vital information like "Databasesize to memory
relation"; and suggest a new postgresql.ini.

That tool needs NO INTEGRATION whatsoever - it can be developed, deployed
totally independend and later only be bundled.

Does my suggestion make more sense now? Or is it still too unrealistic to

make it work properly/safely?

And as this tool can be tested seperately, does not need a new initdb every
time ... it can be developed more easily.

Maybe there is even a pointy flashy version possible (perhaps even for mone=
y
:) which gives nice graphics and "optimized", like those Windows Optimizers=
.
:) I am sure, some DBAs in BIGCOMPs would be thrilled :)

May that be a possible way?

Harald

--=20
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstra=DFe 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.

------=_Part_239023_22416331.1177749095813
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Carlos, about your feature proposal: as I learned, nearly all Perfom=
ance.Configuration can be done by editing the .INI file and making the Post=
master re-read it. So, WHY at all should those parameters be guessed=
at the installation of the database? Would'nt it be a saver point of t=
ime to have some  postgresql-tune=20
 utilitiy, which gets run after the installation, maybe every once in a =
while. That tool can check vital information like "Databasesize to mem=
ory relation"; and suggest a new postgresql.ini. That tool need=
s NO INTEGRATION whatsoever - it can be developed, deployed totally indepen=
dend and later only be bundled.=20
 <div><blockquote class=3D"gmail_quote" style=3D"border-left: 1px so=
lid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Does=
my suggestion make more sense now?  Or is it still too unrealist=
ic to make it work properly/safely?
</blockquote><div> </div></div>And as this tool can be tested seperately=
, does not need a new initdb every time ... it can be developed more easily=
. Maybe there is even a pointy flashy version possible (perhaps eve=
n for money :) which gives nice graphics and "optimized", like th=
ose Windows Optimizers. :)  I am sure, some DBAs in BIGCOMPs would be =
thrilled :)
 May that be a possible way? Harald -- GHUM Ha=
rald Massa persuadere et programmare Harald Armin Massa Reinsburgs=
tra=DFe 202b 70197 Stuttgart 0173/9409607 fx 01212-5-13695179 
- Python: the only language with more web frameworks than keywords.

------=_Part_239023_22416331.1177749095813--

#21

Jonah H. Harris

jonah.harris@gmail.com

almost 19 years ago

In reply to: Shohab Abdullah (#1)

#22

Harald Armin Massa

haraldarminmassa@gmail.com

almost 19 years ago

In reply to: Jonah H. Harris (#21)

#23

moreno_pg@mochima.com

almost 19 years ago

In reply to: Shohab Abdullah (#1)

#24

gsmith@gregsmith.com

almost 19 years ago

In reply to: Josh Berkus (#13)

#25

Craig A. James

cjames@modgraph-usa.com

almost 19 years ago

In reply to: Greg Smith (#24)

#26

Kevin Hunter

hunteke@earlham.edu

almost 19 years ago

In reply to: Craig A. James (#25)

#27

josh@agliodbs.com

almost 19 years ago

In reply to: Greg Smith (#24)

#28

moreno_pg@mochima.com

almost 19 years ago

In reply to: Josh Berkus (#27)

#29

almost 19 years ago

In reply to: Carlos Moreno (#28)

#30

gsmith@gregsmith.com

almost 19 years ago

In reply to: Kevin Hunter (#26)

#31

gsmith@gregsmith.com

almost 19 years ago

In reply to: Josh Berkus (#27)

#32

almost 19 years ago

In reply to: Greg Smith (#31)

#33

Ron Peacetree

rjpeace@earthlink.net

almost 19 years ago

In reply to: Josh Berkus (#13)

#34

josh@agliodbs.com

almost 19 years ago

In reply to: Greg Smith (#31)

#35

almost 19 years ago

In reply to: Josh Berkus (#34)

#36

moreno_pg@mochima.com

almost 19 years ago

In reply to: david@lang.hm (#35)

#37

almost 19 years ago

In reply to: Carlos Moreno (#36)

#38

moreno_pg@mochima.com

almost 19 years ago

In reply to: david@lang.hm (#37)

#39

almost 19 years ago

In reply to: Carlos Moreno (#38)

#40

moreno_pg@mochima.com

almost 19 years ago

In reply to: david@lang.hm (#39)

#41

almost 19 years ago

In reply to: Carlos Moreno (#40)

#42

moreno_pg@mochima.com

almost 19 years ago

In reply to: david@lang.hm (#41)

#43

almost 19 years ago

In reply to: Carlos Moreno (#42)

#44

gsmith@gregsmith.com

almost 19 years ago

In reply to: Josh Berkus (#34)

#45

Michael Stone

mstone+postgres@mathom.us

almost 19 years ago

In reply to: Greg Smith (#44)

#46

Sebastian Hennebrueder

usenet@laliluna.de

almost 19 years ago

In reply to: david@lang.hm (#29)

#47

josh@agliodbs.com

almost 19 years ago

In reply to: Sebastian Hennebrueder (#46)

#48

Sebastian Hennebrueder

usenet@laliluna.de

almost 19 years ago

In reply to: Josh Berkus (#47)

#49