filesystem option tuning

Started by CHalmost 22 years ago5 messagesgeneral

share-postgres@think42.com

almost 22 years ago

Hi All,

I have recently started evaluating Postgresql 7.4.2 to replace some *cough*
more proprietary database systems... Thanks to the _excellent_ documentation
(a point I cannot overemphasize) I was up and running in no time, and got a
first test application running on the native C interface.

There is just one point where I found the documentation lacking any
description and practical hints (as opposed to all other topics), namely
that of how to tune a setup for maximum performance regarding the layout of
partitions on hard-disks and their mount options.

I gather that the pg_xlog directory contains the transaction log and would
benefit greatly from being put on a separate partition. I would then mount
that partition with the noatime and forcedirectio options (on Solaris, the
latter to circumvent the OS' buffer cache)? On the other hand the data
partition should not be mounted with direct io, since Postgresql is
documented as relying heavily on the OS' cache?

Then I was wondering whether the fsync option refers only to the wal log (is
that another name for the xlog, or is one a subset of the other?), or also
to data write operations? With forcedirectio for the wal, do I still need
fsync (or O_SYNC...) because otherwise I could corrupt the data?

Are there any other directories that might benefit from being put on a
dedicated disk, and with which mount options? Even without things like
tablespaces there should be some headroom over having everything on one
partition like in the default setup.

What I should add is that reliability is a premium for us, we do not want to
sacrifice integrity for speed, and that we are tuning for a high commit rate
of small, simple transactions...

I would be greatly thankful if somebody could give me some hints or pointers
to further documentation as my search on the web did not show up much.

Regards, Colin

Richard Huxton

dev@archonet.com

almost 22 years ago

In reply to: CH (#1)

Re: filesystem option tuning

share-postgres@think42.com wrote:

Hi All,

I have recently started evaluating Postgresql 7.4.2 to replace some *cough*
more proprietary database systems... Thanks to the _excellent_ documentation
(a point I cannot overemphasize) I was up and running in no time, and got a
first test application running on the native C interface.

In no official capacity whatsoever, welcome aboard.

There is just one point where I found the documentation lacking any
description and practical hints (as opposed to all other topics), namely
that of how to tune a setup for maximum performance regarding the layout of
partitions on hard-disks and their mount options.

I'm not a Sun user, so I can't give any OS-specific notes, but in general:
- Don't bypass the filesystem, but feel free to tinker with mount
options if you think it will help
- If you can put WAL on separate disk(s), all the better.
- The general opinion seems to be RAID5 is slower than RAID10 unless
you have a lot of disks
- Battery-backed write-cache for your SCSI controller can be a big
performance win
- Tablespaces _should_ be available in the next release of PG, we'll
know for sure soon. That might make life simpler for you if you do want
to spread your database around by hand,

What I should add is that reliability is a premium for us, we do not want to
sacrifice integrity for speed, and that we are tuning for a high commit rate
of small, simple transactions...

Make sure the WAL is on fast disks I'd suggest. At a guess that'll be
your bottleneck.

For more info, your best bet is to check the archives on the
plpgsql-performance list, and then post there. People will probably want
to know more about your database size/number of concurrent
transactions/disk systems etc.

HTH
--
Richard Huxton
Archonet Ltd

share-postgres@think42.com

almost 22 years ago

In reply to: Richard Huxton (#2)

Re: filesystem option tuning

Hi!

On Mon, May 17, 2004 at 06:04:54PM +0100, Richard Huxton wrote:

share-postgres@think42.com wrote:

[...]

In no official capacity whatsoever, welcome aboard.

Thanks ;-)

There is just one point where I found the documentation lacking any
description and practical hints (as opposed to all other topics), namely
that of how to tune a setup for maximum performance regarding the layout of
partitions on hard-disks and their mount options.

I'm not a Sun user, so I can't give any OS-specific notes, but in general:
- Don't bypass the filesystem, but feel free to tinker with mount
options if you think it will help

Right, raw partitions are too low-level for me these days anyhow...
I assume that all postgres partitions can be mounted with noatime?

- If you can put WAL on separate disk(s), all the better.

Does that mean only the xlog, or also the clog? As far as I understand, the
clog contains some meta-information on the xlog, so presumably it is flushed
to disc synchronously together with the xlog? That would mean that they each
need a separate disk to prevent one disk having to seek too often...?

- Battery-backed write-cache for your SCSI controller can be a big
performance win

I probably won't be able to get such a setup for this project; that's why I
am bothering about which disk will be seeking how often.

- Tablespaces _should_ be available in the next release of PG, we'll
know for sure soon. That might make life simpler for you if you do want
to spread your database around by hand,

Ok, I think tablespaces are not the important thing - at least for this
project of ours.

What I should add is that reliability is a premium for us, we do not want to
sacrifice integrity for speed, and that we are tuning for a high commit rate
of small, simple transactions...

Make sure the WAL is on fast disks I'd suggest. At a guess that'll be
your bottleneck.

For more info, your best bet is to check the archives on the
plpgsql-performance list, and then post there. People will probably want
to know more about your database size/number of concurrent
transactions/disk systems etc.

Here goes ... we are talking about a database cluster with two tables where
things are happening, one is a kind of log that is simply "appended" to and
will expect to reach a size of several million entries in the time window
that is kept, the other is a persistent backing of application data that
will mostly see read-modify-writes of single records. Two writers to the
history, one writer to the data table. The volume of data is not very high
and RAM is enough...

If any more information is required feel free to ask - I would really
appreciate getting this disk layout sorted out.

Thanks,
Colin

Shridhar Daithankar

shridhar@frodo.hserus.net

almost 22 years ago

In reply to: CH (#3)

Re: filesystem option tuning

On Wednesday 19 May 2004 13:02, share-postgres@think42.com wrote:

- If you can put WAL on separate disk(s), all the better.

Does that mean only the xlog, or also the clog? As far as I understand, the
clog contains some meta-information on the xlog, so presumably it is
flushed to disc synchronously together with the xlog? That would mean that
they each need a separate disk to prevent one disk having to seek too
often...?

You can put clog and xlog on same drive. That should be enough in most cases.
xlog is written sequentially and never read back other than for recovery
after a crash. clog is typically 8KB or a page and should not be an IO
overhead even in high traffic databases.

- Battery-backed write-cache for your SCSI controller can be a big
performance win

I probably won't be able to get such a setup for this project; that's why I
am bothering about which disk will be seeking how often.

As I said earlier, xlog is written sequentially and if I am not mistaken clog
as well. So there should not be much seeking if they are on a separate drive.

(Please correct me if I am wrong)

- Tablespaces _should_ be available in the next release of PG, we'll
know for sure soon. That might make life simpler for you if you do want
to spread your database around by hand,

Ok, I think tablespaces are not the important thing - at least for this
project of ours.

Well, if you have tablespaces, you don't have to mess with symlinking
clog/xlog or use location facility which is bit rough. You should be able to
manage such a setup solely from postgresql. That is an advantage of
tablespaces.

Here goes ... we are talking about a database cluster with two tables where
things are happening, one is a kind of log that is simply "appended" to and
will expect to reach a size of several million entries in the time window
that is kept, the other is a persistent backing of application data that
will mostly see read-modify-writes of single records. Two writers to the
history, one writer to the data table. The volume of data is not very high
and RAM is enough...

Even if you have enough RAM, you should use pg_autovacuum so that your tables
are in shape. This is especially required when your update/insert rate is
high.

If your history logs needs to be rotated, you can take advantage of the fact
that DDL's in postgresql are fully transacted. So you can drop the table in a
transaction but nobody will notice anything unless it is committed. Makes a
transparent rotation.

HTH

Shridhar

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Shridhar Daithankar (#4)

Re: filesystem option tuning

Shridhar Daithankar <shridhar@frodo.hserus.net> writes:

On Wednesday 19 May 2004 13:02, share-postgres@think42.com wrote:
- If you can put WAL on separate disk(s), all the better.

Does that mean only the xlog, or also the clog?

You can put clog and xlog on same drive.

You can, but I think you shouldn't. The entire argument for giving xlog
its own drive revolves around the fact that xlog is written
sequentially, and so if it has its own spindle then you have near-zero
seek requirements. As soon as you give that drive any other work to do,
you start losing the low-seek property.

Now as Shridhar says, clog is not a very high-I/O-volume thing, so in
one sense it doesn't much matter which drive you put it on. But it
seems to me that clog acts much more like ordinary table files than it
acts like xlog.

regards, tom lane