Weird disk write load caused by PostgreSQL?

Started by Alexander Stauboover 19 years ago4 messagesgeneral
Jump to latest
#1Alexander Staubo
alex@purefiction.net

I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that
seems to be writing data to disk at rates that I think are
disproportional to the update load imposed on the database. I am
looking for ways to determine the cause of this I/O.

As an example, here is a typical graph produced by Munin:

http://purefiction.net/paste/pg-iostat.png

Running an hourly iostat produces this output:

Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 43.50 0.21 0.04 427915 72736
sda 43.62 0.21 0.04 428183 88904
sda 43.74 0.21 0.05 428440 104877
sda 43.90 0.21 0.06 428808 124681
sda 44.06 0.21 0.07 429111 145447
sda 44.27 0.21 0.08 429532 170317
sda 44.46 0.21 0.09 429985 193594

In other words, it's reading about 400MB/hour and writing around
15-20GB/hour, or exactly 118GB during the last six hours. To
determine how well this correlates to the actual inserts and updates
being performed on the database, I ran a loop alongside iostat that
executed "select sum(n_tup_upd), sum(n_tup_ins) from
pg_stat_all_tables" against PostgreSQL every hour and output the
difference. Here are a few samples:

| delta_upd | delta_ins |
+-----------+-----------+
| 7111 | 2343 |
| 7956 | 2302 |
| 7876 | 2181 |
| 9269 | 2477 |
| 8553 | 2205 |

For the write numbers to match the tuple numbers, each updated/
inserted tuple would have to average at least 1.5MB (15 GB divided by
10,000 tuples), which is not the case; the total size of the raw
tuples updated/inserted during the above session probably does not
exceed a couple of megabytes. Even considering overhead, page size,
MVCC, etc., this does not compute.

I have not narrowed this explicitly down to PostgreSQL, but since the
stock Linux kernel we are running does not offer per-process I/O
statistics, I cannot determine this for sure. However, except for the
PostgreSQL database, everything else on the box should be identical
to what we are running on other boxes, which are not exhibiting the
same kind of load. Oh, and I have PostgreSQL logging turned off.

Note that PostgreSQL's performance in itself seems fine, and
according to top/ps it's only very rarely in iowait.

Alexander.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Staubo (#1)
Re: Weird disk write load caused by PostgreSQL?

Alexander Staubo <alex@purefiction.net> writes:

I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that
seems to be writing data to disk at rates that I think are
disproportional to the update load imposed on the database. I am
looking for ways to determine the cause of this I/O.

Are you sure that iostat is to be trusted? The read numbers in
particular look suspiciously uniform ... it would be a strange
query load that would create a read demand changing less than 1%
from hour to hour, unless perhaps that represented the disk's
saturation point, which is not the case if you're not seeing
obvious performance problems.

regards, tom lane

#3Alexander Staubo
alex@purefiction.net
In reply to: Tom Lane (#2)
Re: Weird disk write load caused by PostgreSQL?

On Oct 2, 2006, at 17:50 , Tom Lane wrote:

Alexander Staubo <alex@purefiction.net> writes:

I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that
seems to be writing data to disk at rates that I think are
disproportional to the update load imposed on the database. I am
looking for ways to determine the cause of this I/O.

Are you sure that iostat is to be trusted?

No. :) But iostat reads directly from /dev/diskstats, which should be
reliable. Of course, it still doesn't say anything about which
process is doing the writing; for that I would need to install the
atop kernel patches or similar.

...

The read numbers in
particular look suspiciously uniform ... it would be a strange
query load that would create a read demand changing less than 1%
from hour to hour, unless perhaps that represented the disk's
saturation point, which is not the case if you're not seeing
obvious performance problems.

They are not uniform at all -- they correlate perfectly with the web
traffic; it just so happens that the samples I quoted were from peak
hours. Take a look at the Munin graph. (The spikes correspond to
scheduled maintenance tasks like backups.)

Alexander.

#4Alexander Staubo
alex@purefiction.net
In reply to: Alexander Staubo (#3)
Re: Weird disk write load caused by PostgreSQL?

I'm surprised that I have to reply to myself, since in hindsight this
should be bloody obvious: It's the pgsql_tmp directory. I just
monitored the file creation in that directory, and found PostgreSQL
to be creating huge temporary, extremely short-lived files ranging
from 1MB to 20MB in size. I increased work_mem to 25MB -- should be
perfectly safe on a box with 4GB of RAM, I hope - and the iostat
traffic seems to have dropped to near zero.

I'm a bit baffled as to why PostgreSQL would ever be sorting 20MB of
data in the first place, even with ~12 connections running queries
concurrently, but I suppose I will have to look more closely at our
query patterns.

Alexander.

On Oct 2, 2006, at 19:25 , Alexander Staubo wrote:

Show quoted text

On Oct 2, 2006, at 17:50 , Tom Lane wrote:

Alexander Staubo <alex@purefiction.net> writes:

I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that
seems to be writing data to disk at rates that I think are
disproportional to the update load imposed on the database. I am
looking for ways to determine the cause of this I/O.

Are you sure that iostat is to be trusted?

No. :) But iostat reads directly from /dev/diskstats, which should
be reliable. Of course, it still doesn't say anything about which
process is doing the writing; for that I would need to install the
atop kernel patches or similar.

...

The read numbers in
particular look suspiciously uniform ... it would be a strange
query load that would create a read demand changing less than 1%
from hour to hour, unless perhaps that represented the disk's
saturation point, which is not the case if you're not seeing
obvious performance problems.

They are not uniform at all -- they correlate perfectly with the
web traffic; it just so happens that the samples I quoted were from
peak hours. Take a look at the Munin graph. (The spikes correspond
to scheduled maintenance tasks like backups.)

Alexander.

---------------------------(end of
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match