Volunteer to build a configuration tool

Started by Campbell, Lancealmost 19 years ago33 messagesdocs
Jump to latest
#1Campbell, Lance
lance@uiuc.edu

I am a Java Software architect, DBA, and project manager for the
University of Illinois, Department of Web Services. We use PostgreSQL
to serve about 2 million pages of dynamic content a month; everything
from calendars, surveys, forms, discussion boards, RSS feeds, etc. I am
really impressed with this tool.

The only major problem area I have found where PostgreSQL is really
lacking is in "what should my initial configuration settings be?" I
realize that there are many elements that can impact a DBA's specific
database settings but it would be nice to have a "configuration tool"
that would get someone up and running better in the beginning.

This is my idea:

A JavaScript HTML page that would have some basic questions at the top:

1) How much memory do you have?

2) How many connections will be made to the database?

3) What operating system do you use?

4) Etc...

Next the person would press a button, "generate", found below the
questions. The JavaScript HTML page would then generate content for two
Iframes at the bottom on the page. One Iframe would contain the
contents of the postgresql.conf file. The postgresql.conf settings
would be tailored more to the individuals needs than the standard
default file. The second Iframe would contain the default settings one
should consider using with their operating system.

My web team would be very happy to develop this for the PostgreSQL
project. It would have saved us a lot of time by having a
configuration tool in the beginning. I am willing to make this a very
high priority for my team.

Thanks,

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

#2Mario Gonzalez
gonzalemario@gmail.com
In reply to: Campbell, Lance (#1)
Re: Volunteer to build a configuration tool

On 18/06/07, Campbell, Lance <lance@uiuc.edu> wrote:

Next the person would press a button, "generate", found below the questions.
The JavaScript HTML page would then generate content for two Iframes at the
bottom on the page. One Iframe would contain the contents of the
postgresql.conf file. The postgresql.conf settings would be tailored more
to the individuals needs than the standard default file. The second Iframe
would contain the default settings one should consider using with their
operating system.

I think it could be a great help to newbies. IMVHO a bash script in
dialog could be better than a javascript file. There are many
administrators with no graphics navigator or with no javascript.

--
http://www.advogato.org/person/mgonzalez/

#3Campbell, Lance
lance@uiuc.edu
In reply to: Mario Gonzalez (#2)
Re: Volunteer to build a configuration tool

Mario,
The JavaScript configuration tool I proposed would not be in the install
of PostgreSQL. It would be an HTML page. It would be part of the HTML
documentation or it could be a separate HTML page that would be linked
from the HTML documentation.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu

-----Original Message-----
From: Mario Gonzalez [mailto:gonzalemario@gmail.com]
Sent: Monday, June 18, 2007 10:16 AM
To: Campbell, Lance
Cc: pgsql-docs@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [DOCS] Volunteer to build a configuration tool

On 18/06/07, Campbell, Lance <lance@uiuc.edu> wrote:

Next the person would press a button, "generate", found below the

questions.

The JavaScript HTML page would then generate content for two Iframes

at the

bottom on the page. One Iframe would contain the contents of the
postgresql.conf file. The postgresql.conf settings would be tailored

more

to the individuals needs than the standard default file. The second

Iframe

would contain the default settings one should consider using with

their

operating system.

I think it could be a great help to newbies. IMVHO a bash script in
dialog could be better than a javascript file. There are many
administrators with no graphics navigator or with no javascript.

--
http://www.advogato.org/person/mgonzalez/

#4Y Sidhu
ysidhu@gmail.com
In reply to: Campbell, Lance (#3)
Re: [DOCS] Volunteer to build a configuration tool

On 6/18/07, Campbell, Lance <lance@uiuc.edu> wrote:

Mario,
The JavaScript configuration tool I proposed would not be in the install
of PostgreSQL. It would be an HTML page. It would be part of the HTML
documentation or it could be a separate HTML page that would be linked
from the HTML documentation.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu

-----Original Message-----
From: Mario Gonzalez [mailto:gonzalemario@gmail.com]
Sent: Monday, June 18, 2007 10:16 AM
To: Campbell, Lance
Cc: pgsql-docs@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [DOCS] Volunteer to build a configuration tool

On 18/06/07, Campbell, Lance <lance@uiuc.edu> wrote:

Next the person would press a button, "generate", found below the

questions.

The JavaScript HTML page would then generate content for two Iframes

at the

bottom on the page. One Iframe would contain the contents of the
postgresql.conf file. The postgresql.conf settings would be tailored

more

to the individuals needs than the standard default file. The second

Iframe

would contain the default settings one should consider using with

their

operating system.

I think it could be a great help to newbies. IMVHO a bash script in
dialog could be better than a javascript file. There are many
administrators with no graphics navigator or with no javascript.

--
http://www.advogato.org/person/mgonzalez/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

EXCELLENT idea Lance.

--
Yudhvir Singh Sidhu
408 375 3134 cell

#5Mario Gonzalez
gonzalemario@gmail.com
In reply to: Campbell, Lance (#3)
Re: Volunteer to build a configuration tool

On 18/06/07, Campbell, Lance <lance@uiuc.edu> wrote:

Mario,
The JavaScript configuration tool I proposed would not be in the install
of PostgreSQL. It would be an HTML page. It would be part of the HTML
documentation or it could be a separate HTML page that would be linked
from the HTML documentation.

Ok, then I'm not the correct person to make that decision, however
just a tip: the postgresql documentation was wrote in DocBook SGML

--
http://www.advogato.org/person/mgonzalez/

#6James Neethling
james@silversphere.co.za
In reply to: Campbell, Lance (#1)
Re: Volunteer to build a configuration tool

This is my idea:

A JavaScript HTML page that would have some basic questions at the top:

1) How much memory do you have?

2) How many connections will be made to the database?

3) What operating system do you use?

4) Etc�

Next the person would press a button, �generate�, found below the
questions. The JavaScript HTML page would then generate content for
two Iframes at the bottom on the page. One Iframe would contain the
contents of the postgresql.conf file. The postgresql.conf settings
would be tailored more to the individuals needs than the standard
default file. The second Iframe would contain the default settings one
should consider using with their operating system.

My web team would be very happy to develop this for the PostgreSQL
project. It would have saved us a lot of time by having a
configuration tool in the beginning. I am willing to make this a very
high priority for my team.

Hi Lance,

I agree that having a page that can assist in generating a base
configuration file is an excellent way to start off with a good
configuration that can assist a system administrator in getting half way
to a good configuration. We've recently gone through a process of
configuring a machine and it is a time consuming task of testing and
benchmarking various configuration details.

My thoughts:
Using the browser is a great idea as a universal platform. I can
foreseen a problem in that some users won't have GUI access to the
machine that they are setting up. I don't have much direct experience in
this field, but I suspect that a great number of installations happen
'headless'? This can easily be circumvented by hosting the configuration
builder on a public internet site, possibly postgresql.org?

Also, Javascript isn't the easiest language to use to get all the
decisions that need to be made for various configuration options. Would
it not be a better idea to host a configuration builder centrally,
possible on postgresql.org and have the documentation reference it,
including the docs that come packaged with postgresql (README, INSTALL
documentation?). This would mean that you wouldn't be able to package
the configuration builder, but you would be able to implement more
application logic and more complex decision making in a hosted
application. Of course, I have no idea of the skills that your team
already have :)

To add ideas: perhaps a more advanced tool would be able to add comment
indicating a suggested range for the particular setting. For example,
with 2Gb of RAM, it chooses a workmem of, say, 768Mb, with a comment
indicating a suggested range of 512Mb - 1024Mb.

Thanks for taking the time to put this together and for offering the
services of your team.

Kind regards,
James

#7Andrej Ricnik-Bay
andrej.groups@gmail.com
In reply to: Campbell, Lance (#3)
Re: Volunteer to build a configuration tool

On 6/19/07, Campbell, Lance <lance@uiuc.edu> wrote:

Mario,

Lance,

The JavaScript configuration tool I proposed would not be in the install
of PostgreSQL. It would be an HTML page. It would be part of the HTML
documentation or it could be a separate HTML page that would be linked
from the HTML documentation.

So you're not after a tool that configures postgres at all,
just one that can give you sensible guesstimates for some
parameters based on your intended use?

Thanks,

Lance Campbell

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

#8Greg Smith
gsmith@gregsmith.com
In reply to: Campbell, Lance (#1)
Re: Volunteer to build a configuration tool

On Mon, 18 Jun 2007, Campbell, Lance wrote:

The postgresql.conf settings would be tailored more to the individuals
needs than the standard default file. The second Iframe would contain
the default settings one should consider using with their operating
system.

I'd toyed with making a Javascript based tool for this but concluded it
wasn't ever going to be robust enough for my purposes. It wouldn't hurt
to have it around through, as almost anything is an improvement over the
current state of affairs for new users.

As far as prior art goes here, there was an ambitious tool driven by Josh
Berkus called Configurator that tried to address this need but never got
off the ground, you might want to swipe ideas from it. See
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/configurator/configurator/ for
some documents/code and
http://pgfoundry.org/docman/index.php?group_id=1000106 for a handy
Open-Office spreadsheet.

If you want this to take off as a project, make sure you can release the
code under a free software license compatible with the PostgreSQL project,
so others can contribute to it and it can be assimilated by the core
project if it proves helpful. I know I wouldn't spend a minute working on
this if that's not the case.

I'd suggest you try and get the basic look fleshed out with some
reasonable values for the parameters, then release the source and let
other people nail down the parts you're missing. Don't get stressed about
making sure you have a good value to set for everything before releasing a
beta, it's a lot easier for others to come in and help fix a couple of
parameters once the basic framework is in place.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#9david@lang.hm
david@lang.hm
In reply to: Campbell, Lance (#1)
Re: Volunteer to build a configuration tool

one thing to point out to people about this idea is that nothing says that
this page needs to be served via a webserver. If all the calculations are
done in javascript this could be a local file that you open with a
browser.

do any of the text-mode browsers implement javascript? if so then you have
an answer even for the deeply buried isolated headless servers.

David Lang

On Mon, 18 Jun 2007,
Campbell, Lance wrote:

Show quoted text

I am a Java Software architect, DBA, and project manager for the
University of Illinois, Department of Web Services. We use PostgreSQL
to serve about 2 million pages of dynamic content a month; everything
from calendars, surveys, forms, discussion boards, RSS feeds, etc. I am
really impressed with this tool.

The only major problem area I have found where PostgreSQL is really
lacking is in "what should my initial configuration settings be?" I
realize that there are many elements that can impact a DBA's specific
database settings but it would be nice to have a "configuration tool"
that would get someone up and running better in the beginning.

This is my idea:

A JavaScript HTML page that would have some basic questions at the top:

1) How much memory do you have?

2) How many connections will be made to the database?

3) What operating system do you use?

4) Etc...

#10Steve Atkins
steve@blighty.com
In reply to: david@lang.hm (#9)
Re: Volunteer to build a configuration tool

On Jun 18, 2007, at 4:09 PM, david@lang.hm wrote:

one thing to point out to people about this idea is that nothing
says that this page needs to be served via a webserver. If all the
calculations are done in javascript this could be a local file that
you open with a browser.

do any of the text-mode browsers implement javascript? if so then
you have an answer even for the deeply buried isolated headless
servers.

It doesn't really matter.

The implementation is likely to be trivial, and could be independently
knocked out by anyone in their favorite language in a few hours.

The tricky bits are going to be defining the problem and creating the
alogrithm to do the maths from input to output.

If that's so, the language or platform the proof-of-concept code is
written for isn't that important, as it's likely to be portable to
anything
else without too much effort.

But the tricky bits seem quite tricky (and the first part, defining the
problem, is something where someone developing it on their
own, without some discussion with other users and devs
could easily end up way off in the weeds).

Cheers,
Steve

Show quoted text

David Lang

On Mon, 18 Jun 2007, Campbell, Lance wrote:

I am a Java Software architect, DBA, and project manager for the
University of Illinois, Department of Web Services. We use
PostgreSQL
to serve about 2 million pages of dynamic content a month; everything
from calendars, surveys, forms, discussion boards, RSS feeds,
etc. I am
really impressed with this tool.

The only major problem area I have found where PostgreSQL is really
lacking is in "what should my initial configuration settings be?" I
realize that there are many elements that can impact a DBA's specific
database settings but it would be nice to have a "configuration tool"
that would get someone up and running better in the beginning.

This is my idea:

A JavaScript HTML page that would have some basic questions at the
top:

1) How much memory do you have?

2) How many connections will be made to the database?

3) What operating system do you use?

4) Etc...

---------------------------(end of
broadcast)---------------------------
TIP 6: explain analyze is your friend

#11Greg Smith
gsmith@gregsmith.com
In reply to: david@lang.hm (#9)
Re: Volunteer to build a configuration tool

On Mon, 18 Jun 2007, david@lang.hm wrote:

do any of the text-mode browsers implement javascript?

http://links.twibright.com/

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

In reply to: Campbell, Lance (#1)
Re: Volunteer to build a configuration tool

Campbell, Lance wrote:

Next the person would press a button, “generate”, found below the
questions. The JavaScript HTML page would then generate content for two
Iframes at the bottom on the page. One Iframe would contain the
contents of the postgresql.conf file. The postgresql.conf settings
would be tailored more to the individuals needs than the standard
default file. The second Iframe would contain the default settings one
should consider using with their operating system.

Man, it's not that easy. :-) Mainly because you will need some database
activity. For example, work_mem, checkpoint_segments, and
checkpoint_timeout depends on the database's dynamic.
Database are not that static so another idea is to build a piece of
software that monitors the database and do the modifications based on
some observations (log, stats, etc). Don't forget that some of these
options need a restart. So maybe your tool just advise the DBA that
he/she could change option X to Y.
Such a tool was proposed later [1]http://pgfoundry.org/projects/pgautotune/ but it's not up to date. :(

[1]: http://pgfoundry.org/projects/pgautotune/

--
Euler Taveira de Oliveira
http://www.timbira.com/

#13Ray Stell
stellr@cns.vt.edu
In reply to: Steve Atkins (#10)
Re: Volunteer to build a configuration tool

On Mon, Jun 18, 2007 at 04:35:11PM -0700, Steve Atkins wrote:

On Jun 18, 2007, at 4:09 PM, david@lang.hm wrote:

The tricky bits are going to be defining the problem and creating the
alogrithm to do the maths from input to output.

Why not methodically discuss the the alogrithms on pgsql-performance,
thus improving the chance of being on target up front. Plus, us newbies
get to see what you are thinking thus expanding our universe. I know I'd
read every word.

Thanks for doing this, btw.

#14Mike Benoit
ipso@snappymail.ca
In reply to: Campbell, Lance (#1)
Re: Volunteer to build a configuration tool

It would be cool if someone started a generic configuration+benchmark
utility that could be used with virtually any software. Something like
this:

1. Create a configuration file parser for your specific application, be
it PostgreSQL, MySQL, Apache, whatever.

2. Create a min/max or X,Y,Z configuration option file that determines
which options to try. ie:

shared_buffers = 1000-20000[1000] //1000 is the increment by
wal_buffers = 8,16,32
...

3. Create start/stop scripts for the specific application

4. Create a benchmark script for the application that returns relevant
metrics. In PGSQL's case, it would be tied in to PG bench probably. In
Apache's case AB. This utility would of course need to know how to read
the metrics to determine what is "best".

5. Run the utility. Ideally it would use some sort of genetic algorithm
to benchmark the application initially to get base numbers, then
one-by-one apply the different configuration options and re-run the
benchmark. It would output the metrics for each run and once it is done,
pick the best run and let you know what those settings are.

I don't think something like this would be very difficult at all to
write, and it would be modular enough to work for virtually any
application. For a database it would take a while to run depending on
the benchmark script, but even that you could have a "fast" and "slow"
benchmark script that could be easily run when you first install
PostgreSQL. This way too your not worrying about how much memory the
system has, or how many disks they have, etc... The system will figure
out the best possible settings for a specific benchmark.

Not to mention people could easily take a SQL log of their own
application running, and use that as the benchmark to get "real world"
numbers.

Any other sort of configuration "suggestion" utility will always have
the question of what do you recommend? How much data do you try to get
and what can be determined from that data to get the best settings? Is
it really going to be that much better then the default, at least enough
better to warrant the work and effort put into it?

On Mon, 2007-06-18 at 10:04 -0500, Campbell, Lance wrote:

I am a Java Software architect, DBA, and project manager for the
University of Illinois, Department of Web Services. We use PostgreSQL
to serve about 2 million pages of dynamic content a month; everything
from calendars, surveys, forms, discussion boards, RSS feeds, etc. I
am really impressed with this tool.

The only major problem area I have found where PostgreSQL is really
lacking is in “what should my initial configuration settings be?” I
realize that there are many elements that can impact a DBA’s specific
database settings but it would be nice to have a “configuration tool”
that would get someone up and running better in the beginning.

This is my idea:

A JavaScript HTML page that would have some basic questions at the
top:

1) How much memory do you have?

2) How many connections will be made to the database?

3) What operating system do you use?

4) Etc…

Next the person would press a button, “generate”, found below the
questions. The JavaScript HTML page would then generate content for
two Iframes at the bottom on the page. One Iframe would contain the
contents of the postgresql.conf file. The postgresql.conf settings
would be tailored more to the individuals needs than the standard
default file. The second Iframe would contain the default settings
one should consider using with their operating system.

My web team would be very happy to develop this for the PostgreSQL
project. It would have saved us a lot of time by having a
configuration tool in the beginning. I am willing to make this a very
high priority for my team.

Thanks,

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

--
Mike Benoit <ipso@snappymail.ca>

#15Greg Smith
gsmith@gregsmith.com
In reply to: Mike Benoit (#14)
Re: Volunteer to build a configuration tool

On Wed, 20 Jun 2007, Mike Benoit wrote:

It would be cool if someone started a generic configuration+benchmark
utility that could be used with virtually any software.

It would be cool. It would also be impossible.

Create a benchmark script for the application that returns relevant
metrics. In PGSQL's case, it would be tied in to PG bench probably. In
Apache's case AB. This utility would of course need to know how to read
the metrics to determine what is "best".

The usual situation in these benchmarks is that you get parameters that
adjust along a curve where there's a trade-off between, say, total
throughput and worse-case latency. Specifying "best" here would require a
whole specification language if you want to model how real tuning efforts
work. The AB case is a little simpler, but for PostgreSQL you'd want
something like "With this database and memory sizing, I want the best
throughput possible where maximum latency is usually <5 seconds with 1-30
clients running this transaction, while still maintaining at least 400 TPS
with up to 100 clients, and the crash recovery time can't take more than
10 minutes". There are all sorts of local min/max situations and
non-robust configurations an automated tool will put you into if you don't
force an exhaustive search by being very specific like this.

I don't think something like this would be very difficult at all to
write

Here I just smile and say that proves you've never tried to write one :)
It's a really hard problem that gets harder the more you poke at it.
There's certainly lots of value to writing a utility that automatically
tests out multiple parameter values in a batch and compares the results.
If you're not doing that now, you should consider scripting something up
that does. Going beyond that to having it pick the optimal parameters
more automatically would take AI much stronger than just a genetic
algorithm approach.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#15)
Re: Volunteer to build a configuration tool

Greg Smith <gsmith@gregsmith.com> writes:

On Wed, 20 Jun 2007, Mike Benoit wrote:

I don't think something like this would be very difficult at all to
write

Here I just smile and say that proves you've never tried to write one :)

I'm with Greg on this. It's not that easy to optimize in a
multi-parameter space even if all conditions are favorable, and they
never are.

I think what would be much more useful in the long run is some
serious study of the parameters themselves. For instance,
random_page_cost is a self-admitted oversimplification of reality.
We know that good settings for it depend critically on how large
your DB is relative to your RAM; which means there are at least two
parameters there, but no one's done any serious thinking about how
to disentangle 'em.

regards, tom lane

#17Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#16)
Re: Volunteer to build a configuration tool

On Wed, 20 Jun 2007, Tom Lane wrote:

I think what would be much more useful in the long run is some serious
study of the parameters themselves. For instance, random_page_cost is a
self-admitted oversimplification of reality.

If I could figure out who would sponsor such a study that's what I'd be
doing right now. I have studies on many of the commit-related parameters
I'll have ready in another few days, those are straightforward to map out.
But you know what I have never found? A good benchmark that demonstrates
how well complicated queries perform to run studies on things like
random_page_cost against. Many of the tuning knobs on the query optimizer
seem very opaque to me so far, and I'm not sure how to put together a
proper test to illuminate their operation and map out their useful range.

Here's an example of one of the simplest questions in this area to
demonstate things I wonder about. Let's say I have a properly indexed
database of some moderate size such that you're in big trouble if you do a
sequential scan. How can I tell if effective_cache_size is in the right
ballpark so it will do what I want to effectively navigate that? People
back into a setting for that parameter right now based on memory in their
system, but I never see anybody going "since your main table is X GB
large, and its index is Y GB, you really need enough memory to set
effective_cache_size to Z GB if you want queries/joins on that table to
perform well".

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#17)
Re: Volunteer to build a configuration tool

Greg Smith <gsmith@gregsmith.com> writes:

On Wed, 20 Jun 2007, Tom Lane wrote:

I think what would be much more useful in the long run is some serious
study of the parameters themselves. For instance, random_page_cost is a
self-admitted oversimplification of reality.

If I could figure out who would sponsor such a study that's what I'd be
doing right now.

Hmm ... Sun? EDB? Greenplum? [I'm afraid Red Hat is not likely to
step up to the plate right now, they have other priorities]

Many of the tuning knobs on the query optimizer
seem very opaque to me so far,

At least some of them are demonstrably broken. The issue here is to
develop a mental model that is both simple enough to work with, and
rich enough to predict real-world behavior.

Here's an example of one of the simplest questions in this area to
demonstate things I wonder about. Let's say I have a properly indexed
database of some moderate size such that you're in big trouble if you do a
sequential scan. How can I tell if effective_cache_size is in the right
ballpark so it will do what I want to effectively navigate that?

As the guy who put in effective_cache_size, I'd say it's on the broken
side of the fence. Think about how to replace it with a more useful
parameter, not how to determine a good value for it. "Useful" means
both "easy to determine a value for" and "strong for estimating query
costs", which are contradictory to some extent, but that's the problem
to be solved --- and effective_cache_size doesn't really win on either
metric.

To me, the worst catch-22 we face in this area is that we'd like the
optimizer's choices of plan to be stable and understandable, but the
real-world costs of queries depend enormously on short-term conditions
such as how much of the table has been sucked into RAM recently by
other queries. I have no good answer to that one.

regards, tom lane

#19PFC
lists@peufeu.com
In reply to: Tom Lane (#18)
Re: Volunteer to build a configuration tool

To me, the worst catch-22 we face in this area is that we'd like the
optimizer's choices of plan to be stable and understandable, but the
real-world costs of queries depend enormously on short-term conditions
such as how much of the table has been sucked into RAM recently by
other queries. I have no good answer to that one.

Yeah, there is currently no way to tell the optimizer things like :

- this table/portion of a table is not frequently accessed, so it won't
be in the cache, so please use low-seek plans (like bitmap index scan)
- this table/portion of a table is used all the time so high-seek-count
plans can be used like index scan or nested loops since everything is in
RAM

Except planner hints (argh) I see no way to give this information to the
machine... since it's mostly in the mind of the DBA. Maybe a per-table
"cache temperature" param (hot, warm, cold), but what about the log table,
the end of which is cached, but not the old records ? It's messy.

Still PG does a pretty excellent job most of the time.

#20Greg Smith
gsmith@gregsmith.com
In reply to: PFC (#19)
Re: Volunteer to build a configuration tool

On Wed, 20 Jun 2007, PFC wrote:

Except planner hints (argh) I see no way to give this information to the
machine... since it's mostly in the mind of the DBA.

And the mind of the DBA has a funny way of being completely wrong some
days about what's really happening under the hood.

Maybe a per-table "cache temperature" param (hot, warm, cold), but what
about the log table, the end of which is cached, but not the old records
? It's messy.

One of the things that was surprising to me when I started looking at the
organization of the PostgreSQL buffer cache is how little gross
information about its contents is available. I kept expecting to find a
summary section where you could answer questions like "how much of the
cache currently has information about index/table X?" used as an input to
the optimizer. I understand that the design model expects much of this is
unknowable due to the interaction with the OS cache, and in earlier
versions you couldn't make shared_buffers big enough for its contents to
be all that interesting, so until recently this wasn't worth collecting.

But in the current era, where it's feasible to have multi-GB caches
efficiently managed by PG and one can expect processor time is relatively
cheap, it seems to me one way to give a major boost to the optimizer is to
add some overhead to buffer cache management so it collects such
information. When I was trying to do a complete overhaul on the
background writer, the #1 problem was that I had to assemble my own
statistics on what was inside the buffer cache as it was scanned, because
a direct inspection of every buffer is the only way to know things like
what percentage of the cache is currently dirty.

I can't figure out if I'm relieved or really worried to discover that Tom
isn't completely sure what to do with effective_cache_size either.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#21Campbell, Lance
lance@uiuc.edu
In reply to: Greg Smith (#20)
#22Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Campbell, Lance (#21)
#23Mark Lewis
mark.lewis@mir3.com
In reply to: Greg Smith (#20)
#24Greg Smith
gsmith@gregsmith.com
In reply to: Mark Lewis (#23)
#25Greg Smith
gsmith@gregsmith.com
In reply to: Campbell, Lance (#21)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#25)
#27Dave Page
dpage@pgadmin.org
In reply to: Tom Lane (#26)
#28Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Greg Smith (#25)
#29Campbell, Lance
lance@uiuc.edu
In reply to: Greg Smith (#25)
#30Greg Smith
gsmith@gregsmith.com
In reply to: Campbell, Lance (#29)
#31Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Greg Smith (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#31)
#33Josh Berkus
josh@agliodbs.com
In reply to: Campbell, Lance (#29)