survey of WAL blocksize changes

Started by Mark Wongover 16 years ago9 messages
#1Mark Wong
markwkm@gmail.com

Hi all,

A long long time ago (in 2004) I ran a series of tests surveying the
results of changing BLCKSZ when it was used for both the WAL logs and
the rest of the database system:

http://archives.postgresql.org/pgsql-hackers/2004-03/msg01194.php

Now more than 5 years later and now being able to set the WAL log and
the rest of the database to different block sizes, I have a set of
test results with DBT-2 showing the effects of changing the WAL log
block size on OLTP transaction throughput on ext2, ranging from 1KB to
64KB:

BS notpm % Change from default
-- ----- ----------
1 14673 -4.8%
2 15864 2.9%
4 15774 2.3%
8 15413 (default)
16 16118 4.6%
32 16051 4.1%
64 14874 -3.5%

Pointers to raw data:

BS url
-- ---
1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/
2 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/
4 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/
8 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.2/
16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/

It appears for this workload using a 16KB or 32KB gets more than 4%
throughput improvement, but some of that could be noise. Nothing
quite jaw dropping yet. It'll be interesting to see if the
combination of changing the table block size can further improve the
performance. It will probably be interesting to try different
filesystems and filesystem blocksizes too.

Regards,
Mark Wong

#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Mark Wong (#1)
Re: survey of WAL blocksize changes

On Tue, 2009-05-26 at 19:51 -0700, Mark Wong wrote:

It appears for this workload using a 16KB or 32KB gets more than 4%
throughput improvement, but some of that could be noise.

The baseline appears to have a significant jump in txn response time
after 77 mins on the baseline test. I think you should rerun that. My
guess would be it will reduce any gains shown with higher settings.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Mark Wong (#1)
Re: survey of WAL blocksize changes

On Wednesday 27 May 2009 05:51:05 Mark Wong wrote:

BS notpm % Change from default
-- ----- ----------
1 14673 -4.8%
2 15864 2.9%
4 15774 2.3%
8 15413 (default)
16 16118 4.6%
32 16051 4.1%
64 14874 -3.5%

This means that both somewhat larger and somewhat smaller than 8 give better
results?!?

#4Greg Stark
greg.stark@enterprisedb.com
In reply to: Mark Wong (#1)
Re: survey of WAL blocksize changes

Why on earth would there be a dip precisely at 8k with both smaller
and larger block sizes being faster??

--
Greg

On 27 May 2009, at 03:51, Mark Wong <markwkm@gmail.com> wrote:

Show quoted text

Hi all,

A long long time ago (in 2004) I ran a series of tests surveying the
results of changing BLCKSZ when it was used for both the WAL logs and
the rest of the database system:

http://archives.postgresql.org/pgsql-hackers/2004-03/msg01194.php

Now more than 5 years later and now being able to set the WAL log and
the rest of the database to different block sizes, I have a set of
test results with DBT-2 showing the effects of changing the WAL log
block size on OLTP transaction throughput on ext2, ranging from 1KB to
64KB:

BS notpm % Change from default
-- ----- ----------
1 14673 -4.8%
2 15864 2.9%
4 15774 2.3%
8 15413 (default)
16 16118 4.6%
32 16051 4.1%
64 14874 -3.5%

Pointers to raw data:

BS url
-- ---
1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/
2 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/
4 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/
8 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.2/
16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/

It appears for this workload using a 16KB or 32KB gets more than 4%
throughput improvement, but some of that could be noise. Nothing
quite jaw dropping yet. It'll be interesting to see if the
combination of changing the table block size can further improve the
performance. It will probably be interesting to try different
filesystems and filesystem blocksizes too.

Regards,
Mark Wong

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Mark Wong
markwkm@gmail.com
In reply to: Simon Riggs (#2)
Re: survey of WAL blocksize changes

On Wed, May 27, 2009 at 1:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, 2009-05-26 at 19:51 -0700, Mark Wong wrote:

It appears for this workload using a 16KB or 32KB gets more than 4%
throughput improvement, but some of that could be noise.

The baseline appears to have a significant jump in txn response time
after 77 mins on the baseline test. I think you should rerun that. My
guess would be it will reduce any gains shown with higher settings.

Oopsies. I've rerun, but now that there is no dip, the average
throughput still didn't change much:

BS notpm % Change from default
-- ----- ----------
1 14673 -5.1%
2 15864 2.7%
4 15774 2.1%
8 15454 (default)
16 16118 4.3%
32 16051 3.9%
64 14874 -3.8%

Pointers to raw data:

BS url
-- ---
1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/
2 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/
4 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/
8 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.8/report/
16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/

Regards,
Mark Wong

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Wong (#5)
Re: survey of WAL blocksize changes

Mark Wong <markwkm@gmail.com> writes:

Oopsies. I've rerun, but now that there is no dip, the average
throughput still didn't change much:

BS notpm % Change from default
-- ----- ----------
1 14673 -5.1%
2 15864 2.7%
4 15774 2.1%
8 15454 (default)
16 16118 4.3%
32 16051 3.9%
64 14874 -3.8%

So, if we assume that these numbers are real and not artifacts, it seems
we have to postulate at least four distinct block-size-dependent
performance effects:

1. A strong penalty for smaller block sizes, which becomes dominant
below 2KB.

2. A strong penalty for larger block sizes, which becomes dominant
above 32KB.

3. A weak benefit for smaller block sizes, which is visible at 2-4KB
but fades away at 8KB.

4. A weak benefit for larger block sizes, which only becomes visible
above 8KB.

It's not too hard to believe any of those individually, and even to
think of plausible mechanisms. But it seems a bit unlikely that effects
3 and 4 would exist but consistently cross over right at our traditional
choice of block size.

I'm suspecting that this curve is heavily dependent on details of the
DBT2 test and/or the hardware used. It would be interesting to see if
anyone can replicate it using a different benchmark.

regards, tom lane

#7Greg Stark
greg.stark@enterprisedb.com
In reply to: Tom Lane (#6)
Re: survey of WAL blocksize changes

I find it pretty hard to beleive that 8k is precisely where a drop in
performance shows up unless there's some peculiar reason.

The only peculiar reason I can imagine is full page writes. If the
dbt2 workload is modifying already full pages then the full page
writes will always be just shy of a complete page and with the xlog
record might consistently be just larger than a full block.

I'm not immediately sure why that would cause a problem but it's been
a while since I traced through the xlog code.

--
Greg

On 28 May 2009, at 02:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Mark Wong <markwkm@gmail.com> writes:

Oopsies. I've rerun, but now that there is no dip, the average
throughput still didn't change much:

BS notpm % Change from default
-- ----- ----------
1 14673 -5.1%
2 15864 2.7%
4 15774 2.1%
8 15454 (default)
16 16118 4.3%
32 16051 3.9%
64 14874 -3.8%

So, if we assume that these numbers are real and not artifacts, it
seems
we have to postulate at least four distinct block-size-dependent
performance effects:

1. A strong penalty for smaller block sizes, which becomes dominant
below 2KB.

2. A strong penalty for larger block sizes, which becomes dominant
above 32KB.

3. A weak benefit for smaller block sizes, which is visible at 2-4KB
but fades away at 8KB.

4. A weak benefit for larger block sizes, which only becomes visible
above 8KB.

It's not too hard to believe any of those individually, and even to
think of plausible mechanisms. But it seems a bit unlikely that
effects
3 and 4 would exist but consistently cross over right at our
traditional
choice of block size.

I'm suspecting that this curve is heavily dependent on details of the
DBT2 test and/or the hardware used. It would be interesting to see if
anyone can replicate it using a different benchmark.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#6)
Re: survey of WAL blocksize changes

On Wed, 2009-05-27 at 21:09 -0400, Tom Lane wrote:

So, if we assume that these numbers are real and not artifacts, it seems
we have to postulate at least four distinct block-size-dependent
performance effects:

Two performance effects would be sufficient to explain the results.

* Optimal performance for small WAL changes is reached at around 4kB.
Anything smaller or larger lessens the benefit from this.

* Optimal performance for full page writes is reached at a WAL block
size 2-4 times larger than db block size, corresponding to sizes of WAL
records generated by test.

The two effects have a tail off on either side, giving the four effects
you spoke of.

It's not too hard to believe any of those individually, and even to
think of plausible mechanisms. But it seems a bit unlikely that effects
3 and 4 would exist but consistently cross over right at our traditional
choice of block size.

I could believe two, but we would need some careful instrumentation to
reveal at what times we got benefit. We will never achieve improvements
if we look at figures averaged over longer periods.

We should be trying to improve specific parts of the checkpoint cycle,
which I would break down like this:
* ramp-up
* checkpoint spike
* post-checkpoint trough
* normal running
There is very clear modal behaviour showing in the tests and we should
look at the effects of patches in each case. I could well believe that
we make a gain at one stage and lose on another.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Mark Wong (#5)
Re: survey of WAL blocksize changes

On Wed, 2009-05-27 at 17:51 -0700, Mark Wong wrote:

On Wed, May 27, 2009 at 1:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, 2009-05-26 at 19:51 -0700, Mark Wong wrote:

It appears for this workload using a 16KB or 32KB gets more than 4%
throughput improvement, but some of that could be noise.

The baseline appears to have a significant jump in txn response time
after 77 mins on the baseline test. I think you should rerun that. My
guess would be it will reduce any gains shown with higher settings.

Oopsies. I've rerun, but now that there is no dip, the average
throughput still didn't change much:

BS notpm % Change from default
-- ----- ----------
1 14673 -5.1%
2 15864 2.7%
4 15774 2.1%
8 15454 (default)
16 16118 4.3%
32 16051 3.9%
64 14874 -3.8%

Pointers to raw data:

BS url
-- ---
1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/
2 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/
4 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/
8 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.8/report/
16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/

Look at these graphs, in this order
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/report/rt_d.png
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/report/rt_d.png
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.8/report/rt_d.png

BS 16 and 2 look very similar, though with 16 clearly a better curve.
BS=8 looks very strange in comparison. Still something wrong, I suspect.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support