tuning questions

Started by Jack Coatesover 22 years ago42 messageshackers

Jump to latest

Jack Coates

jack@lyris.com

over 22 years ago

Hi,

sorry for duplication, I asked this on pgsql-admin first before
realizing it wasn't the appropriate list.

I'm having trouble optimizing PostgreSQL for an admittedly heinous
worst-case scenario load.

testbed:
dual P3 1.3 GHz box with 2GB RAM
two IDE 120G drives on separate channels (DMA on), OS on one, DB on the
other, some swap on each (totalling 2.8G).
RH Linux 8.

I've installed PG 7.3.4 from source (./configure && make && make
install) and from PGDG RPMs and can switch back and forth. I also have
the 7.4 source but haven't done any testing with it yet aside from
starting it and importing some data.

The application is on another server, and does this torture test: it
builds a large table (~6 million rows in one test, ~18 million in
another). Rows are then pulled in chunks of 4 to 6 thousand, acted on,
and inserted back into another table (which will of course eventually
grow to the full size of the first).

The problem is that pulling the 4 to 6 thousand rows puts PostgreSQL
into a tail spin: postmaster hammers on CPU anywhere from 90 seconds to
five minutes before returning the data. During this time vmstat shows
that disk activity is up of course, but it doesn't appear to be with
page swapping (free and top and vmstat).

Another problem is that performance of the 6 million row job is decent
if I stop the job and run a vacuumdb --analyze before letting it
continue; is this something that 7.4 will help with? vacuumb --analyze
doesn't seem to have much effect on the 18 million row job.

I've tweaked shared buffers to 8192, pushed sort memory to 2048, vacuum
memory to 8192, and effective cache size to 10000.
/proc/sys/kernel/shmmax is set to 1600000000 and /proc/sys/fs/file-max
is set to 65536. Ulimit -n 3192.

I've read several sites and postings on tuning PG and have tried a
number of different theories, but I'm still not getting the architecture
of how things work.

thanks,
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
--Olivier Fourdan

Josh Berkus

josh@agliodbs.com

over 22 years ago

In reply to: Jack Coates (#1)

Re: tuning questions

Jack,

The application is on another server, and does this torture test: it
builds a large table (~6 million rows in one test, ~18 million in
another). Rows are then pulled in chunks of 4 to 6 thousand, acted on,
and inserted back into another table (which will of course eventually
grow to the full size of the first).

e tweaked shared buffers to 8192, pushed sort memory to 2048, vacuum
memory to 8192, and effective cache size to 10000.
/proc/sys/kernel/shmmax is set to 1600000000 and /proc/sys/fs/file-max
is set to 65536. Ulimit -n 3192.

Have you read this?
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

Actually, your situation is not "worst case". For one thing, your process is
effectively single-user; this allows you to throw all of your resources at
one user. The problem is that your settings have effectively throttled PG
at a level appropriate to a many-user and/or multi-purpose system. You need
to "open them up".

For something involving massive updating/transformation like this, once you've
done the basics (see that URL above) the main settings which will affect you
are sort_mem and checkpoint_segments, both of which I'd advise jacking way up
(test by increments). Raising wal_buffers wouldn't hurt either.

Also, give some thought to running VACUUM and/or ANALYZE between segments of
your procedure. Particularly if you do updates to many rows of a table and
then query based on the changed data, it is vital to run an ANALYZE first,
and usually a good idea to run a VACUUM if it was an UPDATE or DELETE and not
an INSERT.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Jeff

threshar@torgo.978.org

over 22 years ago

In reply to: Jack Coates (#1)

Re: tuning questions

On Thu, 04 Dec 2003 08:06:23 -0800
Jack Coates <jack@lyris.com> wrote:

testbed:
dual P3 1.3 GHz box with 2GB RAM
two IDE 120G drives on separate channels (DMA on), OS on one, DB on
the other, some swap on each (totalling 2.8G).
RH Linux 8.

Side Note: be sure to turn off write caching on those disks or you may
have data corruption in the event of a failure

The problem is that pulling the 4 to 6 thousand rows puts PostgreSQL
into a tail spin: postmaster hammers on CPU anywhere from 90 seconds
to five minutes before returning the data. During this time vmstat
shows that disk activity is up of course, but it doesn't appear to be
with page swapping (free and top and vmstat).

Have you tried modifying the app to retrieve the rows in smaller chunks?
(use a cursor). this way it only needs to alloate memory to hold say,
100 rows at a time instead of 6000.

Also, have you explain analyze'd your queries to make sure PG is picking
a good plan to execute?

I've tweaked shared buffers to 8192, pushed sort memory to 2048,
vacuum memory to 8192, and effective cache size to 10000.
/proc/sys/kernel/shmmax is set to 1600000000 and /proc/sys/fs/file-max
is set to 65536. Ulimit -n 3192.

you should set effective cache size bigger, especially with 2GB of
memory. effective_cache_size tells PG 'about' how much data it cna
expect the OS to cache.

and.. I'm not sure about your query, but perhaps the sort of those 6000
rows is spilling to disk? If you look in explain analyze you'll see in
the "Sort" step(s) it will tell you how many rows and how "wide" they
are. If rows * width > sort_mem, it will have to spill the sort to
disk, which is slow.

If you post query info and explain analyze's we can help optimize the
query itself.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Rob Fielding

rob@dsvr.net

over 22 years ago

In reply to: Jack Coates (#1)

Re: tuning questions

I've tweaked shared buffers to 8192, pushed sort memory to 2048, vacuum
memory to 8192, and effective cache size to 10000.
/proc/sys/kernel/shmmax is set to 1600000000 and /proc/sys/fs/file-max
is set to 65536. Ulimit -n 3192.

Your sharedmemory is too high, and not even being used effectivey. Your
other settings are too low.

Ball park guessing here, but I'd say first read (and understand) this:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

Then make shared memory about 10-20% available ram, and set:

((shmmax/1024) - ( 14.2 * max_connections ) - 250 ) / 8.2 = shared_buffers

decrease random_page_cost to 0.3 and wack up sort mem by 16 times,
effective cache size to about 50% RAM (depending on your other settings)
and try that for starters.

Rob Fielding
rob@dsvr.net

www.dsvr.co.uk Development Designer Servers Ltd

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Jack Coates (#1)

Re: tuning questions

On Thu, 4 Dec 2003, Jack Coates wrote:

Another problem is that performance of the 6 million row job is decent
if I stop the job and run a vacuumdb --analyze before letting it
continue; is this something that 7.4 will help with? vacuumb --analyze
doesn't seem to have much effect on the 18 million row job.

Just to add to what the others have said here, you probably want to run
the pg_autovacuum daemon in the background. It comes with 7.4 but will
work fine with 7.3.

Dror Matalon

dror@zapatec.com

over 22 years ago

In reply to: Jeff (#3)

Re: tuning questions

On Thu, Dec 04, 2003 at 11:59:32AM -0500, Jeff wrote:

On Thu, 04 Dec 2003 08:06:23 -0800
Jack Coates <jack@lyris.com> wrote:

testbed:
dual P3 1.3 GHz box with 2GB RAM
two IDE 120G drives on separate channels (DMA on), OS on one, DB on
the other, some swap on each (totalling 2.8G).
RH Linux 8.

Side Note: be sure to turn off write caching on those disks or you may
have data corruption in the event of a failure

I've seen this comment several times from different people.
Would someone care to explain how you would get data corruption? I
thought that the whole idea of the log is to provide a journal similar
to what you get in a journaling file system.

In other words, the db writes a series of transactions to the log and marks
that "log entry" (don't know the right nomeclature) as valid. When the db
crashes, it reads the log, and discards the last "log entry" if it wasn't
marked as valid, and "replays" any transactions that haven't been
commited ot the db. The end result being that you might loose your last
transaction(s) if the db crashes, but nothing ever gets corrupted.

So what am I missing in this picture?

Regards,

Dror

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

Josh Berkus

josh@agliodbs.com

over 22 years ago

In reply to: scott.marlowe (#5)

Re: tuning questions

Scott,

Just to add to what the others have said here, you probably want to run
the pg_autovacuum daemon in the background. It comes with 7.4 but will
work fine with 7.3.

I don't recommend using pg_autovacuum with a data transformation task. pg_av
is designed for "regular use" not huge batch tasks.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

Jord Tanner

jord@indygecko.com

over 22 years ago

In reply to: Dror Matalon (#6)

Re: tuning questions

If I understand the problem correctly, the issue is that IDE drives
signal that data has been written to disk when they actually are holding
the data in the write cache. In the case of a power down (and I remember
someone showing some test results confirming this, check the list
archive) the data in the drive write cache is lost, resulting in
corrupted logs.

Anyone else have more details?

Jord Tanner

On Thu, 2003-12-04 at 09:57, Dror Matalon wrote:

On Thu, Dec 04, 2003 at 11:59:32AM -0500, Jeff wrote:

On Thu, 04 Dec 2003 08:06:23 -0800
Jack Coates <jack@lyris.com> wrote:

testbed:
dual P3 1.3 GHz box with 2GB RAM
two IDE 120G drives on separate channels (DMA on), OS on one, DB on
the other, some swap on each (totalling 2.8G).
RH Linux 8.

Side Note: be sure to turn off write caching on those disks or you may
have data corruption in the event of a failure

I've seen this comment several times from different people.
Would someone care to explain how you would get data corruption? I
thought that the whole idea of the log is to provide a journal similar
to what you get in a journaling file system.

In other words, the db writes a series of transactions to the log and marks
that "log entry" (don't know the right nomeclature) as valid. When the db
crashes, it reads the log, and discards the last "log entry" if it wasn't
marked as valid, and "replays" any transactions that haven't been
commited ot the db. The end result being that you might loose your last
transaction(s) if the db crashes, but nothing ever gets corrupted.

So what am I missing in this picture?

Regards,

Dror

--
Jord Tanner <jord@indygecko.com>

Andrew Sullivan

andrew@libertyrms.info

over 22 years ago

In reply to: Dror Matalon (#6)

Re: tuning questions

On Thu, Dec 04, 2003 at 09:57:38AM -0800, Dror Matalon wrote:

I've seen this comment several times from different people.
Would someone care to explain how you would get data corruption? I
thought that the whole idea of the log is to provide a journal similar
to what you get in a journaling file system.

So what am I missing in this picture?

That a journalling file system can _also_ have file corruption if you
have write caching enabled and no battery back up. If the drive
tells the OS, "Yep! It's all on the disk!" bit it is _not_ actually
scribed in the little bitty magnetic patterns -- and at that very
moment, the power goes away -- the data that was reported to have been
on the disk, but which was actually _not_ on the disk, is no longer
anywhere. (Well, except in the past. But time travel was disabled
some versions ago. ;-)

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

#10

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Josh Berkus (#7)

Re: tuning questions

On Thu, 4 Dec 2003, Josh Berkus wrote:

Scott,

Just to add to what the others have said here, you probably want to run
the pg_autovacuum daemon in the background. It comes with 7.4 but will
work fine with 7.3.

I don't recommend using pg_autovacuum with a data transformation task. pg_av
is designed for "regular use" not huge batch tasks.

What bad thing is likely to happen if it's used here? Fire too often or
use too much I/O bandwidth? Would that be fixed by the patch being tested
to introduce a delay every x pages of vacuuming?

#11

Jack Coates

jack@lyris.com

over 22 years ago

In reply to: Jack Coates (#1)

Re: tuning questions

On Thu, 2003-12-04 at 09:12, Rob Fielding wrote:

I've tweaked shared buffers to 8192, pushed sort memory to 2048, vacuum
memory to 8192, and effective cache size to 10000.
/proc/sys/kernel/shmmax is set to 1600000000 and /proc/sys/fs/file-max
is set to 65536. Ulimit -n 3192.

Your sharedmemory is too high, and not even being used effectivey. Your
other settings are too low.

Ball park guessing here, but I'd say first read (and understand) this:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

I've read it many times, understanding is slower :-)

Then make shared memory about 10-20% available ram, and set:

((shmmax/1024) - ( 14.2 * max_connections ) - 250 ) / 8.2 = shared_buffers

decrease random_page_cost to 0.3 and wack up sort mem by 16 times,
effective cache size to about 50% RAM (depending on your other settings)
and try that for starters.

Following this, I've done:
2gb ram
=
2,000,000,000
bytes

15 % of that
=
300,000,000
bytes

divided by
1024
=
292,969
kbytes

max_conn *
14.2
=
454
kbytes

subtract c4
=
292,514
kbytes

subtract 250
=
292,264
kbytes

divide by 8.2
=
35,642
shared_buffers

performance is unchanged for the 18M job -- pg continues to use ~
285-300M, system load and memory usage stay the same. I killed that,
deleted from the affected tables, inserted a 6M job, and started a
vacuumdb --anaylze. It's been running for 20 minutes now...

getting the SQL query better optimized for PG is on my todo list, but
not something I can do right now -- this application is designed to be
cross-platform with MS-SQL, PG, and Oracle so tweaking SQL is a touchy
subject.

The pgavd conversation is intriguing, but I don't really understand the
role of vacuuming. Would this be a correct statement: "PG needs to
regularly re-evaluate the database in order to adjust itself?" I'm
imagining that it continues to treat the table as a small one until
vacuum informs it that the table is now large?
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
--Olivier Fourdan

Import Notes

Reply to msg id not found: 3FCF6AEB.908@dsvr.net

#12

Eric Soroos

eric-psql@soroos.net

over 22 years ago

In reply to: Andrew Sullivan (#9)

Re: tuning questions

On Dec 4, 2003, at 10:11 AM, Andrew Sullivan wrote:

On Thu, Dec 04, 2003 at 09:57:38AM -0800, Dror Matalon wrote:

I've seen this comment several times from different people.
Would someone care to explain how you would get data corruption? I
thought that the whole idea of the log is to provide a journal similar
to what you get in a journaling file system.

So what am I missing in this picture?

That a journalling file system can _also_ have file corruption if you
have write caching enabled and no battery back up. If the drive
tells the OS, "Yep! It's all on the disk!" bit it is _not_ actually
scribed in the little bitty magnetic patterns -- and at that very
moment, the power goes away -- the data that was reported to have been
on the disk, but which was actually _not_ on the disk, is no longer
anywhere. (Well, except in the past. But time travel was disabled
some versions ago. ;-)

It's not just a theoretical problem. It's happened to me on a laptop
drive in the last week or so.

I was testing out dbmail by hammering on it on Panther laptop, hfs+
journaling enabled, psql 7.4, latest and greatest. I managed to hang
the system hard, requiring a reboot. Psql wouldn't start after the
crash, complaining of a damaged relation and helpfully telling me that
'you may need to restore from backup'.

No big deal on the data loss, since it was a test/hammering
installation. It would have been nice to be able to drop that relation
or prune the entire database, but I'm sure that would ultimately run
into referential integrity problems.

eric

#13

Josh Berkus

josh@agliodbs.com

over 22 years ago

In reply to: Jack Coates (#11)

Re: tuning questions

Jack,

Following this, I've done:
2gb ram
=
2,000,000,000
bytes

This calculation is fun, but I really don't know where you got it from. It
seems quite baroque. What are you trying to set, exactly?

getting the SQL query better optimized for PG is on my todo list, but
not something I can do right now -- this application is designed to be
cross-platform with MS-SQL, PG, and Oracle so tweaking SQL is a touchy
subject.

Well, if you're queries are screwed up, no amount of .conf optimization is
going to help you much. You could criticize that PG is less adept than
some other systems at re-writing "bad queries", and you would be correct.
However, there's not much to do about that on existing systems.

How about posting some sample code?

The pgavd conversation is intriguing, but I don't really understand the
role of vacuuming. Would this be a correct statement: "PG needs to
regularly re-evaluate the database in order to adjust itself?" I'm
imagining that it continues to treat the table as a small one until
vacuum informs it that the table is now large?

Not Vacuum, Analyze. Otherwise correct. Mind you, in "regular use" where
only a small % of the table changes per hour, periodic ANALYZE is fine.
However, in "batch data transform" analyze statements need to be keyed to the
updates and/or imports.

BTW, I send a couple of e-mails to the Lyris documentation maintainer about
updating out-of-date information about setting up PostgreSQL. I never got a
response, and I don't think my changes were made.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#14

Jack Coates

jack@lyris.com

over 22 years ago

In reply to: Josh Berkus (#13)

Re: tuning questions

On Thu, 2003-12-04 at 11:20, Josh Berkus wrote:

Jack,

Following this, I've done:
2gb ram
=
2,000,000,000
bytes

This calculation is fun, but I really don't know where you got it from. It
seems quite baroque. What are you trying to set, exactly?

Message-ID: <3FCF6AEB.908@dsvr.net>
Date: Thu, 04 Dec 2003 17:12:11 +0000
From: Rob Fielding <rob@dsvr.net

I'm trying to set Postgres's shared memory usage in a fashion that
allows it to return requested results quickly. Unfortunately, none of
these changes allow PG to use more than a little under 300M RAM.
vacuumdb --analyze is now taking an inordinate amount of time as well
(40 minutes and counting), so that change needs to be rolled back.

getting the SQL query better optimized for PG is on my todo list, but
not something I can do right now -- this application is designed to be
cross-platform with MS-SQL, PG, and Oracle so tweaking SQL is a touchy
subject.

Well, if you're queries are screwed up, no amount of .conf optimization is
going to help you much. You could criticize that PG is less adept than
some other systems at re-writing "bad queries", and you would be correct.
However, there's not much to do about that on existing systems.

How about posting some sample code?

Tracking that down in CVS and translating from C++ is going to take a
while -- is there a way to get PG to log the queries it's receiving?

The pgavd conversation is intriguing, but I don't really understand the
role of vacuuming. Would this be a correct statement: "PG needs to
regularly re-evaluate the database in order to adjust itself?" I'm
imagining that it continues to treat the table as a small one until
vacuum informs it that the table is now large?

Not Vacuum, Analyze. Otherwise correct. Mind you, in "regular use" where
only a small % of the table changes per hour, periodic ANALYZE is fine.
However, in "batch data transform" analyze statements need to be keyed to the
updates and/or imports.

BTW, I send a couple of e-mails to the Lyris documentation maintainer about
updating out-of-date information about setting up PostgreSQL. I never got a
response, and I don't think my changes were made.

She sits on the other side of the cube wall from me, and if I find a
decent config it's going into the manual -- consider this a golden
opportunity :-)

--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
--Olivier Fourdan

#15

Richard Huxton

dev@archonet.com

over 22 years ago

In reply to: Jack Coates (#14)

Re: tuning questions

On Thursday 04 December 2003 19:50, Jack Coates wrote:

I'm trying to set Postgres's shared memory usage in a fashion that
allows it to return requested results quickly. Unfortunately, none of
these changes allow PG to use more than a little under 300M RAM.
vacuumdb --analyze is now taking an inordinate amount of time as well
(40 minutes and counting), so that change needs to be rolled back.

You don't want PG to use all your RAM, it's designed to let the underlying OS
do a lot of caching for it. Probably worth having a look at vmstat/iostat and
see if it's saturating on I/O.

--
Richard Huxton
Archonet Ltd

#16

Jack Coates

jack@lyris.com

over 22 years ago

In reply to: Richard Huxton (#15)

Re: tuning questions

On Thu, 2003-12-04 at 12:27, Richard Huxton wrote:

On Thursday 04 December 2003 19:50, Jack Coates wrote:

I'm trying to set Postgres's shared memory usage in a fashion that
allows it to return requested results quickly. Unfortunately, none of
these changes allow PG to use more than a little under 300M RAM.
vacuumdb --analyze is now taking an inordinate amount of time as well
(40 minutes and counting), so that change needs to be rolled back.

You don't want PG to use all your RAM, it's designed to let the underlying OS
do a lot of caching for it. Probably worth having a look at vmstat/iostat and
see if it's saturating on I/O.

latest changes:
shared_buffers = 35642
max_fsm_relations = 1000
max_fsm_pages = 10000
wal_buffers = 64
sort_mem = 32768
vacuum_mem = 32768
effective_cache_size = 10000

/proc/sys/kernel/shmmax = 500000000

IO is active, but hardly saturated. CPU load is hefty though, load
average is at 4 now.

procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 2 1 2808 11436 39616 1902988 0 0 240 896 765 469
2 11 87
0 2 1 2808 11432 39616 1902988 0 0 244 848 768 540
4 3 93
0 2 1 2808 11432 39616 1902984 0 0 204 876 788 507
3 4 93
0 2 1 2808 11432 39616 1902984 0 0 360 416 715 495
4 1 96
0 2 1 2808 11432 39616 1902984 0 0 376 328 689 441
2 1 97
0 2 0 2808 11428 39616 1902976 0 0 464 360 705 479
2 1 97
0 2 1 2808 11428 39616 1902976 0 0 432 380 718 547
3 1 97
0 2 1 2808 11428 39616 1902972 0 0 440 372 742 512
1 3 96
0 2 1 2808 11428 39616 1902972 0 0 416 364 711 504
3 1 96
0 2 1 2808 11424 39616 1902972 0 0 456 492 743 592
2 1 97
0 2 1 2808 11424 39616 1902972 0 0 440 352 707 494
2 1 97
0 2 1 2808 11424 39616 1902972 0 0 456 360 709 494
2 2 97
0 2 1 2808 11436 39616 1902968 0 0 536 516 807 708
3 2 94

--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
--Olivier Fourdan

#17

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Jack Coates (#16)

Re: tuning questions

On Thu, 4 Dec 2003, Jack Coates wrote:

On Thu, 2003-12-04 at 12:27, Richard Huxton wrote:

On Thursday 04 December 2003 19:50, Jack Coates wrote:

I'm trying to set Postgres's shared memory usage in a fashion that
allows it to return requested results quickly. Unfortunately, none of
these changes allow PG to use more than a little under 300M RAM.
vacuumdb --analyze is now taking an inordinate amount of time as well
(40 minutes and counting), so that change needs to be rolled back.

You don't want PG to use all your RAM, it's designed to let the underlying OS
do a lot of caching for it. Probably worth having a look at vmstat/iostat and
see if it's saturating on I/O.

latest changes:
shared_buffers = 35642
max_fsm_relations = 1000
max_fsm_pages = 10000
wal_buffers = 64
sort_mem = 32768
vacuum_mem = 32768
effective_cache_size = 10000

/proc/sys/kernel/shmmax = 500000000

IO is active, but hardly saturated. CPU load is hefty though, load
average is at 4 now.

Postgresql is busily managing a far too large shared buffer. Let the
kernel do that. Postgresql's shared buffers should be bug enough to hold
as much of the current working set as it can, up to about 25% or so of the
servers memory, or 512Meg, whichever comes first. Unless a single query
will actually use all of the buffer at once, you're not likely to see an
improvement.

Also, your effective cache size is really small. On a typical Postgresql
server with 2 gigs of ram, you'll have about 1 to 1.5 gigs as kernel cache
and buffer, and if it's dedicated to postgresql, then the effective cache
setting for 1 gig would be 131072 (assuming 8k pages).

If you're updating a lot of tuples without vacuums, you'll likely want to
up your fsm settings.

Note you can change things like sort_mem, effective_cache_size and
random_page_cost on the fly (but not buffers, they're allocated at
startup, nor fsm, they are as well.)

so, if you're gonna have one huge honkin query that needs to sort a
hundred megs at a time, but you'd rather not up your sort memory that high
(sort mem is PER SORT, not per backend or per database, so it can get out
of hand quickly) then you can just

set sort_mem=128000;

before throwing out the big queries that need all the sort.

#18

Josh Berkus

josh@agliodbs.com

over 22 years ago

In reply to: Jack Coates (#16)

Re: tuning questions

Jack,

latest changes:
shared_buffers = 35642

This is fine, it's about 14% of available RAM. Though the way you calculated
it still confuses me. It's not complicated; it should be between 6% and 15%
of available RAM; since you're doing a data-transformation DB, yours should
be toward the high end.

max_fsm_relations = 1000
max_fsm_pages = 10000

You want to raise this a whole lot if your data transformations involve large
delete or update batches. I'd suggest running "vacuum analyze verbose"
between steps to see how many dead pages you're accumulating.

wal_buffers = 64
sort_mem = 32768
vacuum_mem = 32768
effective_cache_size = 10000

This is way the heck too low. it's supposed to be the size of all available
RAM; I'd set it to 2GB*65% as a start.

IO is active, but hardly saturated. CPU load is hefty though, load
average is at 4 now.

Unless you're doing huge statistical aggregates (like radar charts), or heavy
numerical calculations-by-query, high CPU and idle I/O usually indicates a
really bad query, like badly mismatched data types on a join or unconstrained
joins or overblown formatting-by-query.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#19

Eric Soroos

eric-psql@soroos.net

over 22 years ago

In reply to: Jack Coates (#16)

Re: tuning questions

IO is active, but hardly saturated. CPU load is hefty though, load
average is at 4 now.

procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs
us sy id

0 2 1 2808 11432 39616 1902984 0 0 204 876 788 507
3 4 93

You're getting a load average of 4 with 93% idle?

That's a reasonable number of context switches, and if the blocks
you're reading/writing are discontinous, I could see io saturation
rearing it's head.

This looks to me like you're starting and killing a lot of processes.

Is this thrashing psql connections, or is it one big query? What are
your active processes?

Your effective cache size looks to be about 1900 megs (+- binary),
assuming all of it is pg.

eric

#20

Jack Coates

jack@lyris.com

over 22 years ago

In reply to: Josh Berkus (#18)

Re: tuning questions

On Thu, 2003-12-04 at 13:24, Josh Berkus wrote:

Jack,

latest changes:
shared_buffers = 35642

This is fine, it's about 14% of available RAM. Though the way you calculated
it still confuses me. It's not complicated; it should be between 6% and 15%
of available RAM; since you're doing a data-transformation DB, yours should
be toward the high end.

max_fsm_relations = 1000
max_fsm_pages = 10000

You want to raise this a whole lot if your data transformations involve large
delete or update batches. I'd suggest running "vacuum analyze verbose"
between steps to see how many dead pages you're accumulating.

This looks really difficult to tune, and based on the load I'm giving
it, it looks really important. I've tried the verbose analyze and I've
looked at the rules of thumb, neither approach seems good for the
pattern of "hammer the system for a day or two, then leave it alone for
a week." I'm setting it to 500000 (half of the biggest table size
divided by a 6k page size), but I'll keep tweaking this.

wal_buffers = 64
sort_mem = 32768
vacuum_mem = 32768
effective_cache_size = 10000

This is way the heck too low. it's supposed to be the size of all available
RAM; I'd set it to 2GB*65% as a start.

This makes a little bit of difference. I set it to 65% (15869 pages).
Now we have some real disk IO:
procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 3 1 2804 10740 40808 1899856 0 0 26624 0 941 4144
13 24 63
1 2 1 2804 10808 40808 1899848 0 0 21748 60 1143 3655
9 22 69

still high cpu (3-ish load) though, and there's no noticeable
improvement in query speed.

IO is active, but hardly saturated. CPU load is hefty though, load
average is at 4 now.

Unless you're doing huge statistical aggregates (like radar charts), or heavy
numerical calculations-by-query, high CPU and idle I/O usually indicates a
really bad query, like badly mismatched data types on a join or unconstrained
joins or overblown formatting-by-query.

Ran that by the programmer responsible for this area and watched the
statements go by with tcpdump -X. Looks like really simple stuff to me:
select a handful of values, then insert into one table and delete from
another.
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
--Olivier Fourdan

#21

Jack Coates

jack@lyris.com

over 22 years ago

In reply to: Eric Soroos (#19)

#22

Richard Huxton

dev@archonet.com

over 22 years ago

In reply to: Jack Coates (#20)

#23

Jack Coates