go for a script! / ex: PostgreSQL vs. MySQL

Started by Nick Barrover 22 years ago32 messagesgeneral
Jump to latest
#1Nick Barr
nicky@chuckie.co.uk

Heya Guys n Gals,

Having been following the thread on "go for a script! / ex: PostgreSQL vs.
MySQL". I thought I would throw something together in Perl. My current issue
is that I only have access to a RH Linux box and so cannot make it
cross-platform on my own :-(. Anyhow please find it attached. It runs fine
on my box, it doesnt actually write to postgresql.conf because I didnt want
to mess it up, it does however write to postgresql.conf.new for the moment.
The diffs seem to be writing correctly. There are a set of parameters at the
top which may need to get tweaked for your platform. I can also carry on
posting to this list new versions if people want. Clearly this lot is open
source, so please feel free to play with it and post patches/new features
back either to the list or my email directly. In case you cant see my email
address, it is nicky at the domain below.

I will also post it on me website and as I develop it further new versions
will appear there

http://www.chuckie.co.uk/postgresql/pg_autoconfig.pl

Is this a useful start?

Nick

Attachments:

pg_autoconfig.plapplication/octet-stream; name=pg_autoconfig.plDownload
#2Josh Berkus
josh@agliodbs.com
In reply to: Nick Barr (#1)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Nick,

Having been following the thread on "go for a script! / ex: PostgreSQL vs.
MySQL". I thought I would throw something together in Perl.

Cool! Would you be willing to work with me so that I can inject some of my
knowledge of .conf tuning?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#3Nick Barr
nicky@chuckie.co.uk
In reply to: Josh Berkus (#2)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Josh Berkus wrote:

Nick,

Having been following the thread on "go for a script! / ex: PostgreSQL vs.
MySQL". I thought I would throw something together in Perl.

Cool! Would you be willing to work with me so that I can inject some of my
knowledge of .conf tuning?

Sounds good to me. I will carry on working on it but I would definitely
need some help, or at least a list of parameters to tweak, and some
recomended values based on data about the puter in question.

So far:

shared_buffers = 1/16th of total memory
effective_cache_size = 80% of the supposed kernel cache.

I guess we also may be able to offer a simple and advanced mode. Simple
mode would work on these recomended values, but kick it into advanced
mode and the user can tweak things more finely. This would only be
recomended for the Guru's out there of course. This may take a bit more
time to do though.

As I said in the previous email I have only got access to Linux, so
cross-platform help would be good too. I will try to make it as easy to
do cross platform stuff as possible of course.

Nick

#4Vivek Khera
khera@kcilink.com
In reply to: Nick Barr (#1)
Re: go for a script! / ex: PostgreSQL vs. MySQL

"NB" == Nick Barr <nicky@chuckie.co.uk> writes:

NB> So far:

NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

Please take into account the blocksize compiled into PG, too...

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: khera@kciLink.com Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

#5Josh Berkus
josh@agliodbs.com
In reply to: Nick Barr (#3)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Nick,

Sounds good to me. I will carry on working on it but I would definitely
need some help, or at least a list of parameters to tweak, and some
recomended values based on data about the puter in question.

shared_buffers = 1/16th of total memory
effective_cache_size = 80% of the supposed kernel cache.

But only if it's a dedicated DB machine. If it's not, all memory values
should be cut in half.

I guess we also may be able to offer a simple and advanced mode. Simple
mode would work on these recomended values, but kick it into advanced
mode and the user can tweak things more finely. This would only be
recomended for the Guru's out there of course. This may take a bit more
time to do though.

What I would prefer would be an interactive script which would, by asking the
user simple questions and system scanning, collect all the information
necessary to set:

max_connections
shared_buffers
sort_mem
vacuum_mem
effective_cache_size
random_page_cost
max_fsm_pages
checkpoint_segments & checkpoint_timeout
tcp_ip

and on the OS, it should set:
shmmax & shmmall
and should offer to create a chron job which does appropriate frequency VACUUM
ANALYZE.

As I said in the previous email I have only got access to Linux, so
cross-platform help would be good too. I will try to make it as easy to
do cross platform stuff as possible of course.

Let's get it working on Linux; then we can rely on the community to port it to
other platforms. I myself can work on the ports to Solaris and OS X.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#6Josh Berkus
josh@agliodbs.com
In reply to: Vivek Khera (#4)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Vivek,

NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

Please take into account the blocksize compiled into PG, too...

We can;t change the blocksize in a script that only does the .conf file. Or
are you suggesting something else?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#7Vivek Khera
khera@kcilink.com
In reply to: Josh Berkus (#6)
Re: go for a script! / ex: PostgreSQL vs. MySQL

"JB" == Josh Berkus <josh@agliodbs.com> writes:

JB> Vivek,
NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

Please take into account the blocksize compiled into PG, too...

JB> We can;t change the blocksize in a script that only does the .conf
JB> file. Or are you suggesting something else?

when you compute optimal shared buffers and effective cache size,
these are in terms of blocksize. so if I have 16k block size, you
can't compute based on default 8k blocksize. at worst, it would have
to be a parameter you pass to the tuning script.

#8Josh Berkus
josh@agliodbs.com
In reply to: Vivek Khera (#7)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Vivek,

when you compute optimal shared buffers and effective cache size,
these are in terms of blocksize. so if I have 16k block size, you
can't compute based on default 8k blocksize. at worst, it would have
to be a parameter you pass to the tuning script.

Oh, yes! Thank you.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#9Sean Chittenden
sean@chittenden.org
In reply to: Vivek Khera (#4)
Re: go for a script! / ex: PostgreSQL vs. MySQL

NB> So far:

NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

Please take into account the blocksize compiled into PG, too...

Would anyone object to a patch that exports the blocksize via a
readonly GUC? Too many tunables are page dependant, which is
infuriating when copying configs from DB to DB. I wish pgsql had some
notion of percentages for values that end with a '%'. -sc

--
Sean Chittenden

#10Bruce Momjian
bruce@momjian.us
In reply to: Sean Chittenden (#9)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Sean Chittenden wrote:

NB> So far:

NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

Please take into account the blocksize compiled into PG, too...

Would anyone object to a patch that exports the blocksize via a
readonly GUC? Too many tunables are page dependant, which is
infuriating when copying configs from DB to DB. I wish pgsql had some
notion of percentages for values that end with a '%'. -sc

Makes sense to me --- we already have some read-only GUC variables.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#11Rod Taylor
rbt@rbt.ca
In reply to: Sean Chittenden (#9)
Re: go for a script! / ex: PostgreSQL vs. MySQL

On Fri, 2003-10-10 at 18:59, Sean Chittenden wrote:

NB> So far:

NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

Please take into account the blocksize compiled into PG, too...

Would anyone object to a patch that exports the blocksize via a
readonly GUC? Too many tunables are page dependant, which is
infuriating when copying configs from DB to DB. I wish pgsql had some
notion of percentages for values that end with a '%'.

Rather than showing the block size, how about we change the tunables to
be physical sizes rather than block based?

effective_cache_size = 1.5GB
shared_buffers = 25MB

Percentages would be slick as well, but doing the above should fix most
of the issue -- and be friendlier to read.

#12Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Vivek Khera (#4)
Re: go for a script! / ex: PostgreSQL vs. MySQL

NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

I think Sean(?) mentioned this one for FreeBSD (Bash code):

echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"

I've used it for my dedicated servers. Is this calculation correct?

Chris

#13Sean Chittenden
sean@chittenden.org
In reply to: Christopher Kings-Lynne (#12)
Re: go for a script! / ex: PostgreSQL vs. MySQL

NB> shared_buffers = 1/16th of total memory
NB> effective_cache_size = 80% of the supposed kernel cache.

I think Sean(?) mentioned this one for FreeBSD (Bash code):

sh, not bash. :)

echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"

I've used it for my dedicated servers. Is this calculation correct?

Yes, or it's real close at least. vfs.hibufspace is the amount of
kernel space that's used for caching IO operations (minus the
necessary space taken for the kernel). If you're real paranoid, you
could do some kernel profiling and figure out how much of the cache is
actually disk IO and multiply the above by some percentage, say 80%?
I haven't found it necessary to do so yet. Since hibufspace is all IO
and caching any net activity is kinda pointless and I assume that 100%
of it is used for a disk cache and don't use a multiplier. The 8192,
however, is the size of a PG page, so, if you tweak PG's page size,
you have to change this constant (*grumbles*).

-sc

--
Sean Chittenden

#14Harald Fuchs
nospam@sap.com
In reply to: Nick Barr (#1)
Re: go for a script! / ex: PostgreSQL vs. MySQL

In article <1065837333.12875.1.camel@jester>,
Rod Taylor <rbt@rbt.ca> writes:

Would anyone object to a patch that exports the blocksize via a
readonly GUC? Too many tunables are page dependant, which is
infuriating when copying configs from DB to DB. I wish pgsql had some
notion of percentages for values that end with a '%'.

Rather than showing the block size, how about we change the tunables to
be physical sizes rather than block based?

effective_cache_size = 1.5GB
shared_buffers = 25MB

Amen! Being forced to set config values in some obscure units rather
than bytes is an ugly braindamage which should be easy to fix.

#15Ron Johnson
ron.l.johnson@cox.net
In reply to: Harald Fuchs (#14)
Re: go for a script! / ex: PostgreSQL vs. MySQL

On Sat, 2003-10-11 at 05:22, Harald Fuchs wrote:

In article <1065837333.12875.1.camel@jester>,
Rod Taylor <rbt@rbt.ca> writes:

Would anyone object to a patch that exports the blocksize via a
readonly GUC? Too many tunables are page dependant, which is
infuriating when copying configs from DB to DB. I wish pgsql had some
notion of percentages for values that end with a '%'.

Rather than showing the block size, how about we change the tunables to
be physical sizes rather than block based?

effective_cache_size = 1.5GB
shared_buffers = 25MB

Amen! Being forced to set config values in some obscure units rather
than bytes is an ugly braindamage which should be easy to fix.

But it's too user-friendly to do it this way!

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

When Swedes start committing terrorism, I'll become suspicious of
Scandanavians.

#16Nick Barr
nicky@chuckie.co.uk
In reply to: Josh Berkus (#5)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Josh Berkus wrote:

shared_buffers = 1/16th of total memory
effective_cache_size = 80% of the supposed kernel cache.

But only if it's a dedicated DB machine. If it's not, all memory values
should be cut in half.

What I would prefer would be an interactive script which would, by asking the
user simple questions and system scanning, collect all the information
necessary to set:

max_connections
shared_buffers
sort_mem
vacuum_mem
effective_cache_size
random_page_cost
max_fsm_pages
checkpoint_segments & checkpoint_timeout
tcp_ip

and on the OS, it should set:
shmmax & shmmall
and should offer to create a chron job which does appropriate frequency VACUUM
ANALYZE.

I reckon do a system scan first, and parse the current PostgreSQL conf
file to figure out what the settings are. Also back it up with a date
and time appended to the end to make sure there is a backup before
overwriting the real conf file. Then a bunch of questions. What sort of
questions would need to be asked and which parameters would these
questions affect? So far, and from my limited understanding of the .conf
file, I reckon there should be the following

Here is your config of your hardware as detected. Is this correct ?

This could potentially be several questions, i.e. one for proc, mem,
os, hdd etc
Would affect shared_buffers, sort_mem, effective_cache_size,
random_page_cost

How was PostgreSQL compiled?

This would be parameters such as the block size and a few other
compile time parameters. If we can get to some of these read-only
parameters than that would make this step easier, certainly for the new
recruits amongst us.

Is PostgreSQL the only thing being run on this computer?

Then my previous assumptions about shared_buffers and
effective_cache_size would be true.

If shmmax and shmmall are too small, then:

PostgreSQL requires some more shared memory to cache some tables, x Mb,
do you want to increase your OS kernel parameters?

Tweak shmmax and shmmall

How are the clients going to connect?

i.e. TCP or Unix sockets

How many clients can connect to this database at once?

Affects max_connections

How many databases and how many tables in each database are going to be
present?

Affects max_fsm_pages, checkpoint_segments, checkpoint_timeout

Do you want to vacuum you database regularly?

Initial question for cron job

It is recomended that you vacuum analyze every night, do you want to do
this?
It is also recomended that you vacuum full every month, do you want to
do this?

Thoughts?

Nick

#17Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Nick Barr (#16)
Re: go for a script! / ex: PostgreSQL vs. MySQL

If shmmax and shmmall are too small, then:

PostgreSQL requires some more shared memory to cache some tables, x Mb,
do you want to increase your OS kernel parameters?

Tweak shmmax and shmmall

Note that this still requires a kernel recompile on FreeBSD :(

Chris

#18Josh Berkus
josh@agliodbs.com
In reply to: Nick Barr (#16)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Nick,

I reckon do a system scan first, and parse the current PostgreSQL conf
file to figure out what the settings are. Also back it up with a date
and time appended to the end to make sure there is a backup before
overwriting the real conf file. Then a bunch of questions. What sort of
questions would need to be asked and which parameters would these
questions affect? So far, and from my limited understanding of the .conf
file, I reckon there should be the following

Hmmm ... but I do think that there should be a file to store the user's
previous answers. That way, the script can easily be re-run to fix config
issues.

Here is your config of your hardware as detected. Is this correct ?

This could potentially be several questions, i.e. one for proc, mem,
os, hdd etc
Would affect shared_buffers, sort_mem, effective_cache_size,
random_page_cost

Actually, I think this would break down into:
-- Are Proc & Mem correct? If not, type in correct values
-- Is OS correct? If not, select from list
-- Your HDD: is it:
1) IDE
2) Fast multi-disk SCSI or low-end RAID
3) Medium-to-high-end RAID

Other things, we don't care about.

How was PostgreSQL compiled?

This would be parameters such as the block size and a few other
compile time parameters. If we can get to some of these read-only
parameters than that would make this step easier, certainly for the new
recruits amongst us.

Actually, from my perspective, we shouldn't bother with this; if an admin
knows enough to set an alternate blaock size for PG, then they know enough to
tweak the Conf file by hand. I think we should just issue a warning that
this script:
1) does not work for anyone who is using non-default block sizes,
2) may not work well for anyone using unusual locales, optimization flags, or
other non-default compile options except for language interfaces.
3) cannot produce good settings for embedded systems;
4) will not work well for systems which are extremely low on disk space,
memory, or other resouces.
Basically, the script only really needs to work for the people who are
installing PostgreSQL with the default options or from RPM on regular server
or workstation machines with plenty of disk space for normal database
purposes. People who have more complicated setups can read the darned
documentation and tune the conf file by hand.

Is PostgreSQL the only thing being run on this computer?

First, becuase it affects a couple of other variables:

What kind of database server are you expecting to run?
A) Web Server (many small fast queries from many users, and not much update
activity)
B) Online Transaction Processing (OLTP) database (many small updates
constantly from many users; think "accounting application").
C) Online Analytical Reporting (OLAP) database (a few large and complicated
read-only queries aggregating large quantites of data for display)
D) Data Transformation tool (loading large amounts of data to process,
transform, and output to other software)
E) Mixed-Use Database Server (a little of all of the above)
F) Workstation (installing this database on a user machine which also has a
desktop, does word processing, etc.)

If the user answers anything but (F), then we ask:

Will you be running any other signficant software on this server, such as a
web server, a Java runtime engine, or a reporting application? (yes|no)

If yes, then:

How much memory do you expect this other software, in total, to regularly use
while PostgreSQL is in use? (# in MB; should offer default of 50% of the RAM
scanned).

How are the clients going to connect?

i.e. TCP or Unix sockets

We should warn them that they will still need to configure pg_hba.conf.

How many clients can connect to this database at once?

Affects max_connections

Should add a parenthetical comment that for applications which use pooled
connections, or intermittent connection, such as Web applications, the number
of concurrent connections is often much lower than the number of concurrent
users.

How many databases and how many tables in each database are going to be
present?

Affects max_fsm_pages, checkpoint_segments, checkpoint_timeout

Also need to ask if they have an idea of the total size of all databases, in
MB or GB, which has a stronger relationship to those variables.

Also, this will give us a chance to check the free space on the PGDATA
partition, and kick the user out with a warning if there is not at least
2xExpected Size available.

Do you want to vacuum you database regularly?

Initial question for cron job

It is recomended that you vacuum analyze every night, do you want to do
this?
It is also recomended that you vacuum full every month, do you want to
do this?

Depends on size/type of database. For large OLTP databases, I recommend
vacuum as often as every 5 mintues, analyze every hour, and Vacuum Full +
Reindex once a week. For a workstation database, your frequencies are
probably OK.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#19Josh Berkus
josh@agliodbs.com
In reply to: Christopher Kings-Lynne (#17)
Re: go for a script! / ex: PostgreSQL vs. MySQL

Chris,

PostgreSQL requires some more shared memory to cache some tables, x Mb,
do you want to increase your OS kernel parameters?

Tweak shmmax and shmmall

Note that this still requires a kernel recompile on FreeBSD :(

Not our fault, now is it? This would mean that we wouldn't be able to script
for FreeBSD. Bug the FreeBSD developers.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#20Ron Johnson
ron.l.johnson@cox.net
In reply to: Josh Berkus (#19)
Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

On Sun, 2003-10-12 at 15:31, Josh Berkus wrote:

Chris,

PostgreSQL requires some more shared memory to cache some tables, x Mb,
do you want to increase your OS kernel parameters?

Tweak shmmax and shmmall

Note that this still requires a kernel recompile on FreeBSD :(

Not our fault, now is it? This would mean that we wouldn't be able to script
for FreeBSD. Bug the FreeBSD developers.

<TROLL=HAND-GRENADE>
Or use a good OS, instead.
</TROLL>

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"Oh, great altar of passive entertainment, bestow upon me thy
discordant images at such speed as to render linear thought
impossible"
Calvin, regarding TV

#21pginfo
pginfo@t1.unisoftbg.com
In reply to: Nick Barr (#1)
#22Vivek Khera
khera@kcilink.com
In reply to: Sean Chittenden (#13)
#23Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Vivek Khera (#22)
#24ingrim
ingridm@qoslabs.com
In reply to: Vivek Khera (#7)
#25Vivek Khera
khera@kcilink.com
In reply to: Nick Barr (#1)
#26Vivek Khera
khera@kcilink.com
In reply to: Nick Barr (#1)
#27Sean Chittenden
sean@chittenden.org
In reply to: Vivek Khera (#22)
#28Sean Chittenden
sean@chittenden.org
In reply to: Josh Berkus (#19)
#29Chris Browne
cbbrowne@acm.org
In reply to: Nick Barr (#1)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#28)
#31Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Vivek Khera (#25)
#32Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Chris Browne (#29)